Archives for March 2005
Thursday, March 31, 2005

Is Apple trying to rip me off?

My Dell Inspiron laptop is almost 3 years old and coming to the end of it's warranty and I was just casually pondering the idea of buying a 15" PowerBook. Not seriously looking you understand, but interested, browsing, ...

Anyway according to Apple's UK store the price of the 15" Combo PB G4 1.5GHz is £1379+VAT. That's pretty steep (an equivalent spec Dell Latitude comes in around £967+VAT) but if you consider Apple styling, attention to detail, and having a Unix based computer with a nice GUI worth something then, maybe, you can live with that.

However I was pretty ticked off when I checked US Apple store. The same PowerBook in the US costs $1,999 which, at todays rate, equates to £1,059. Why am I supposed to pay £320+VAT more to buy the same PowerBook here than in the US?

In the interests of being fair it appears that Dell are charging UK customers approximately £100 more than US customers (although I could be wrong as I found it harder to match the equivalent models between their UK and US sites.)

31/03/2005 16:27 by Matt Mower | Permalink | comments:
More about:
Wednesday, March 30, 2005

More than just wayback

Listening to Brewster has given me a new appreciation for the Internet Archive. I thought they were just the Wayback Machine which I use from time to time but I was shocked at just how much content they have online.

  • 21,687 audio items
  • 21,371 texts
  • 2,001 movies
  • 34,319 software items
I'm looking at the IA in a new light.

30/03/2005 16:39 by Matt Mower | Permalink | comments:
More about:

Pod me no pods

So, thanks to Phil, I just listened to my first podcast. Actually it's probably not my first because I've downloaded MP3 of people talking before but it is the first one I've downloaded with iPodder.

I found little of obvious interest in the Podcasting directory. The directory seems pretty poor - no metadata, no tags - but Phil had mentioned IT Conversations to me so I downloaded Brewster Kahle talking about building the Library of Babel. It was good, Brewster is a great speaker and has a grand vision. I'm sure I'll listen to a few more of these.

However I'm not convinced that I'm going to be a podding convert because of access to the medium. When I'm at the computer I'm usually busy with something and listening to people talk is either a distraction (if they're interesting) or doesn't get attended to. Unfortunately if I want to pay attention my computers have too many ways of distracting me. It's not an easy listening station!

The idea of having an iPod which gets filled with new and interesting things every day which I can listen to on a drive or on the tube is attractive. Maybe if I had an iPod it would work. Hmm... something to think about anyway.

30/03/2005 16:33 by Matt Mower | Permalink | comments:
More about:
Thursday, March 24, 2005

Maybe I didn't need a network to predict this!

Neural networks are very cool but not suitable for all applications. I've basically been stumped by the problems of trying to use a network for indentifying interesting weblog posts.

The first problem is the input problem. How do you represent arbitrary chunks of text to the network in a meaningful way?

The problem here is that you have a layer of input neurons which form the input to the network. The inputs are driven by the environment (e.g. the text) and must consist of real values which can be fed to the next (hidden) layer of the network. If you're measuring temperatures, voltages, water levels, and so on then you are working with real values already. If you're working in image recognition you tend to have a fixed array of pixels (e.g. 640x480). But what about text?

I find myself presented with two sub-problems:

  1. The length of the text is not finite. Measuring temperature you might have 2 or 3 sensors. A weblog post could as easily be 5 words or 50,000 words depending upon the authors whim. Although, in practice, you could say "no post will ever be more than 1MB of text" and treat 1MB as a limit that can create it's own problems.

  2. Text doesn't neatly correspond to a real value input pattern. How do you represent the specific text as a numeric value?

An approach I formulated was to chop the input into keywords (doing appropriate stop-word rejection and so forth) and then feed the most relevant n keywords to the network as input for that text. The keywords could be uniquely numbered and then represented as a binary value. Each bit of the binary value would correspond to an input cell raising a value of 0.0 or 1.0 depending upon whether the bit is set or not. If we allowed a total of 4,096 possible keywords this can be represented in 12 bits (2^12=4096). If we used the 10 most relevant keywords for each post, i.e. n=10, then the input layer would, therefore, be composed of 12n or 120 cells.

However even having reached this point there are further problems to consider:

  • In a large post there may well be more than 10 keywords which means losing relevant information.

  • In a small post there may be less than 10 relevant keywords. What input is provided for non-existent keywords?

  • The relevance of the keyword to the item is not encoded. This problem might be solved by adding further input cells for each keyword to express keyword relevance.

  • The network acts as a feature detector. If we considering a set of temperature sensors wired to the inputs, each sensor will be wired to a specific set of inputs, they won't change. However a keyword that is detected in one position (i.e. represented in one set of input cells) for one item may be detected in another position for a different item and won't be considered the same feature. This is likely to be problematic.

Basically, when it comes to free text, input is a mess.

And then there are pragmatic problems like:

  • "How big should the hidden layer be?"

    Too small and the network won't learn, too big and the network will mimic rather than learning to generalize properly and will also be slow.

  • "Should we have one hidden layer or two?"

    As a general rule of thumb it appears that in 85% of cases 2 layers works best with 3 layers performing better in the rest.

  • "What training rate should be used?"

    Set too high and the network bounces around unable to settle on a solution, set too low and it never converges on useful behaviour

Each of these problems is solvable but usually involves trial and error searching by training the network, deciding whether it is effective and, if not, junking it and trying some different combination of parameters. This is fine if you have a fixed training set with which you train the network repeatedly and then, having found the best parameter combination, just use it from there on.

However, in my application, the training is done by the user interactively and the training set will be different for each of them, and will change over time. Although the accumulated training data could be stored and the network retrained in the background I think this could mean that the network would never be useful.

All of which is leading me to think (as others have advised me) that a Neural Network may not be the best solution to this particular problem. I'm interested in whether anyone has any successful experience in this area. Otherwise I'm probably going to start looking more closely at Bayesian classifiers.

24/03/2005 18:34 by Matt Mower | Permalink | comments:
Sunday, March 20, 2005

Goodbye Vader.. Hello Razr

Yesterday I laid to rest my trusty Motorola V3688 (Vader was the nickname for the black V-Series clamshell phones). I bought it 4 years ago and it has served me well. I've looked at replacing it several times over the last couple of years but updated Moto phones didn't cut it in terms of simplicity, call and build quality. However damage to the hinge and a steady decline in battery life has made replacement more urgent.

The Razr V3 was my choosen replacement and I could finally afford to do the upgrade. So far I love the call quality, the excellent speakerphone, the MP3 ring tones (Futurama theme tune!), great screen, great keypad, and the sleek looks. I anticipate I will find having a datebook & calculator useful (in fact I already used the calculator). Bluetooth, GPRS, Camera, and Java I'm really not sure about.

I know there is a V6 coming but I don't care, I have bought a phone I think I'm going to enjoy using and owning. A worthy successor to my Vader, may it last as long!

20/03/2005 18:56 by Matt Mower | Permalink | comments:
More about:
Saturday, March 19, 2005

I need a network to help me understand all these networks

I posted some questions yesterday about using training a neural networks to learn how to classify interesting posts. Mikel kindly pointed me at Kohonen maps. This a vector based technique for clustering related items on a surface. In this respect it sounds similar to Multi-dimensional scaling and Latent Semantic Indexing.

I guess my biggest problem is trying to evaluate so many techniques, which are new to me, in so little time. It's a challenge.

I haven't given up on the idea of using a backpropagation network either. It occurred to me that one solution to the input problem would be to reduce each item to a set of n ranked keywords, i.e. the top 10 rated keywords. This would provide a fixed vector for input to the network. I'm thinking this might be simple enough to implement and effective enough to meet the 80/20 rule.

19/03/2005 01:09 by Matt Mower | Permalink | comments:
More about:
Thursday, March 17, 2005

Learning artificial learning

Any AI gurus out there?

I'm considering the problem of using a neural network to evaluate chunks of text and determine whether they are interesting or not and it's taxing me a little.

My theoretical approach is to take an item and a manually assigned score. Break the item into a series of keywords and then supply the keywords (possibly with order) to the network using backpropagation to train it against the manually assigned score.

Leaving aside the training issues for a second I have a much bigger problem: How to supply input to the network.

In all of the books I am reading the examples effectively use a fixed input size. For example in image recognition you are sampling a fixed pixel space. In control applications there are a number of inputs sampling a pre-determined set of physical systems. However I'm not clear how to map that to my situation.

Forgetting any attempts to train the network on keyword order or keyword strength, how can we even supply keyword information as inputs when the set of keywords is potentially unbounded?

The closest example I have seen (so far) to my situation is called NETtalk which is a system for pronouncing English words. NETtalk was trained using a sample dictionary of 5,000 words with the phonemes corresponding to each letter. The training process used a sliding window of 7 characters where the network could look at the current character in the context of the 3 preceding and following characters. The output from the network is the phoneme to represent the current character.

Attempting to adapt such a scheme to the keyword analysis situation we might decide on a maximum possible number of keywords (let's say 4,096) which will fit into 12 bits (2^12 = 4096). Each keyword would then be given a unique id (I do this already) in the range 0-4095. The network could then have a 12n bit wide input layer corresponding to a moving window of n keywords.

However an obvious problem raises it's head at this point. NETtalk was outputing a phoneme for each character and using surrounding characters for context. My situation each keyword contributes to the overall score of the article. To my way of thinking the two situations do not match up.

Ideally you would simply supply all the keyword information for the article, however the number of keywords is not bounded which creates an input problem I don't know how to resolve.

Basically, I need help. Can anyone assist me?

17/03/2005 23:12 by Matt Mower | Permalink | comments:
More about:
Wednesday, March 16, 2005

Ah Lotus blossom

I was listening to a program on Radio 4 this evening which covered the field of Biomimetics. It was quite an interesting program but especially in it's coverage of the lotus effect. This is the effect seen in Lotus flowers where the leaves are a perfectly slippery surface so that water, dirt, seemingly anything just slides right off them.

Some german scientists have discovered that, somewhat unintuitively, the surface of the lotus petal is not smooth but intricately knobbly at the microscopic level. This knobblyness means that little of the surface of the petal is ever in contact with anything touching it and so there is practically no friction.

During the program you could hear a number of demonstrations of the lotus effect such as a spoon with lotus effect surface. Honey slid off the spoon as if it was mercury. The demo's were obviously pretty cool to watch. And this is where the majesty of Radio just didn't cut it!

What bugs me though is that, try as I might, I can't find a good video demonstration on the web.

Beyond honey-proof spoons there was much discussion of the use of the lotus effect in self-cleaning buildings and within hospitals (since it is so slippery that even biofilms won't adhere).

16/03/2005 22:41 by Matt Mower | Permalink | comments:
More about:

I'll build my own theme park, with black jack and hookers. In fact, screw the theme park!

I always wanted my own country!

And we're #10 in the Woodchipping industry. Hooray for us!

16/03/2005 21:45 by Matt Mower | Permalink | comments:

Cutting the dead wood

Following Stowe's lead I've begun a process of removing myself from useless social network applications.

Though I was initially very positive about these services long acquaintance has lead me to conclude that they really haven't grown my social network, merely created some kind of reflection of it.

I would meet someone, either face to face or through my blog, and then we'd go "let's join networks" but I can't say I ever experienced significant second order effects. Maybe I just don't network that way.

So in the spirit of if it isn't working -- stop doing it I've terminated my accounts with Ecademy, Ryze, and Orkut. LinkedIn was the service I've had the most positive experience with so I'm going to ponder that one a little.

Since I've re-introduced comments on C&C you know where to find me if you want me -- right here.

16/03/2005 13:07 by Matt Mower | Permalink | comments:
Tuesday, March 15, 2005

Passed

I forgot to mention here: I have passed the first two modules in my Postgraduate Diploma in Psychology. I received a grade 'A' for Developmental Psychology I and 'B' for Personality and Social Psychology.

I was a little surprised that the results weren't the other way around as I felt that social psychology was my stronger subject. I guess I should find out what let me down there.

Nevertheless I'm very glad to have passed. I won't say I'm looking forward to the next set of exams, but maybe I'm feeling a little happier at the prospect ;-)

15/03/2005 14:29 by Matt Mower | Permalink | comments:
More about:

Don't call us...

I forgot to mention that I now have a Skype-In account with a shiny new voicemail box.

At €10 for 3 months (including the voicemail) it seemed like too good an opportunity to pass up. I'm now using the Skype number as my office number which makes the voicemail doubly handy.

15/03/2005 13:48 by Matt Mower | Permalink | comments:
More about:

A test post

Topics are back.

15/03/2005 11:27 by Matt Mower | Permalink | comments:
More about:
Thursday, March 10, 2005

Just testing.

Just moved Radio onto a new machine.  Checking it still works before I delete the old copy.
10/03/2005 18:17 by Matt Mower | Permalink | comments:
Wednesday, March 09, 2005

Not just tidy, neat too!

Like most people who've had to deal with nasty HTML markup I've used Dave Ragget's HTML-Tidy utility in one form or another. Most recently as a built-in part of the HTML-Kit editor. It's always done a remarkable job of making even the nastiest HTML usable.

When I started work on a recent project I found myself having to deal with all kinds of horrible HTML, and all of it horrible in different and unpredictable ways. My naive attempts to write a sanitizer were going nowhere fast when octopod in #ruby asked if I'd ever come across TidyLib.

It turns out that when Dave turned HTML-Tidy loose it got picked up and maintained by a group of people who created a neat, open source, library to which others have added bindings for their own language. There's even a TidyLib RubyGem for those enlightened folks who use that particular language. So, at a stroke, I had all that evil markup validating as XHTML-Strict and was saved from a world of hurt. Deep, deep, joy!

There was just one fly in the ointment. One particular item ended up with some stray non-SGML characters in it and I traced the problem back to the output from TidyLib. My heart started sinking.

Reading through the archives of the TidyLib mailing list I couldn't see anything relevant but I did find out that there is a #tidy channel over on FreeNode. There I spoke to Björn Höhrmann who, it turns out, has been maintaining TidyLib for the last 3 years. Even though he's not a Ruby coder Björn downloaded the gem code, started comparing it to the library source, and quickly narrowed it down to a likely buffer overwrite in the gem code.

Then I had to go away for a couple of days.  I came back today ready to start persuing a fix only to find Björn several steps ahead of me. In the interim he had spoken to Kevin Howe who maintains the gem. They worked together to isolate the bug, Kevin patched it and then updated the gem.

All I had to do was type "gem update tidy", sit back, and smile :-)

My special thanks to Björn for taking the time to look at the Ruby code, spot the problem, and follow through. Star quality man!
09/03/2005 23:59 by Matt Mower | Permalink | comments:
Friday, March 04, 2005

Don't be too clever

AlexPKeaton in #RubyOnRails just quoted Kernighan's Law of Debugging which I thought was worth sharing:
Debugging is twice as hard as writing the program, so if you write the program as cleverly as you can, by definition, you won't be clever enough to debug it.
04/03/2005 19:51 by Matt Mower | Permalink | comments:
Wednesday, March 02, 2005

Oh man I hate Lazlo soundblox

Okay it may seem cool to put the Lazlo music player thingy in your pages but it's really not.  Especially when the damn thing seems impossible to turn off.  Ugh.
02/03/2005 22:35 by Matt Mower | Permalink | comments:

How do you advertise a search engine?

It seems to me, from watching the recent "new, more precise, MSN search" adverts, that Microsoft have no idea how to market search.  I know it's risky to generalize from your own quirky existence but is there anyone whose made the switch?

To be honest the advert didn't even persuade me to load the page and have a look (and this when I'm currently troubled by Google's monopoly and the future of auto-link).  In fact I don't even know the URL for MSN search.  I suppose I could Google for it.  But, why bother?  If MSN search had anything to offer wouldn't I have heard about it? Even Scoble doesn't have much to say.

Shouldn't Microsoft be doing better than this?
02/03/2005 22:28 by Matt Mower | Permalink | comments:

Questioning a long held belief

I've been thinking recently about the UK joining Europe and whether it's a good idea or not.  It's been my firmly held belief for some time that we should join the European Union and the single currency as soon as possible.  I'm beginning to question that belief. Well, part of it.

I think I may be in that small minority who actually want the UK to adopt the euro, but don't want us to join a European political union.  I'm definitely not a little Englander but with every breath I take I become less and less enamoured of the idea of federal supergovernments and the increasing power of states.  I quoted George Washingtons foreign policy a few days ago and it's been ringing in my ears ever since.

I think the European Union was an inevitable (and maybe necessary) consequence of centuries of warring among the European powers.  People are afraid of war and seek to band together by closer union and harmonization to avoid conflict.  But does the EU really have any bearing on whether a 3rd world war could happen in Europe?  I'm not convinced.

On the other hand moving increasing amount of power away from National governments to a centralized European beuracracy sounds a pretty unattractive proposition to me.  It's not like it even means smaller national governments since, inevitably, there will be more and more civil servants to interpret and implement all the policies coming from Brussels. We end up with the worst of both worlds and it all has to be paid for.

Part of me says that a European government might have a limiting effect on the excesses of national governments.  But European politics is also rife with corruption and misdeed.  European politicians are drawn from the same swill barrell that national politicians wallow in and have about as much to commend them.  They're just 2,000 miles less accountable to the electors.

Maybe with the waning of the American empire the time of superpowers is drawing to an end?  The baton will not be passed to the EU certainly.  I think it's time for the freewheelin' micro-states to emerge:
"commerce and friendship with all, entangling alliances with none"
Think about it.
02/03/2005 20:54 by Matt Mower | Permalink | comments:

Can it be true? A SubEthaEdit for Windows?

MoonEdit is Tom Dobrowolski's cross-platform (including Windows!) collaborative editor ala the excellent SubEthaEdit. It doesn't look as pretty as SubEthaEdit but, if it works, I won't be holding that against it.

Two key questions for me are:
  • Does it use Rendezvous?
  • Is it compatible with SubEthaEdit?
Rendezvous is available and working on Windows (Trillian allows you to chat via Rendezvous) and would make setup simpler in the most common scenarios. Also it would be a huge shame if it wasn't at least planned that we can work with Mac users.

Whose up for sharing a document?
02/03/2005 16:29 by Matt Mower | Permalink | comments:

Skype into Trillian won't go (but I wish it did)

I was talking to my friend Joe via Trillian (now updated to 3.1 which gives me back the features I was missing from the IRC plugin -- nice!) and we were talking about Skype and how we were looking forward to Skype voice mail.  Then it occurred to me: I really wish that Skype would componentize their technology and license it to other IM client vendors.  That is, I don't see any reason why I shouldn't have a Skype plug-in to Trillian (in the same way that I have an AIM plugin which does voice and video).  This would allow me to keep Trillian as my messaging hub.

Given that Skype themselves are likely to make all their profits from network services it seems like building clients is a loss-making business for them, especially now that they've pump primed the market for their services.  Why not let the people who specialise in the clients do their thing and get on with building better services for their network?
02/03/2005 14:44 by Matt Mower | Permalink | comments:

A stick in the sand

Just putting my MemeScope stick in the sand.
02/03/2005 13:14 by Matt Mower | Permalink | comments:

Healing for dollars

Whenever I think of "pledges" I always think of the Dilbert cartoon where Dogbert becomes a TV evangelist with proper TV evangelist hair and his own show "healing for dollars."  This is a different kind of pledge altogether (and, no, it's not polish either).

Last night I used the excellent WriteToThem service (the follow on to FaxYourMP) to send my MP (Siobhain McDonagh) a message expressing my concern and disapproval about possible UK military actions either directly in Iran or Syria or in support of US or Israeli forces engaged in such actions. WriteToThem makes it simple to contact the right person and ensure there is followup.

It also gave me information about a new service they are about to launch called PledgeBank.  Here's the blurb:

PledgeBank is a web site to help get people past the barrier of not wanting to act alone. We all know that people hate thinking that they might be the only person to turn up to an event, or the only person who volunteers to help with a cause. PledgeBank is about overcoming this common fear.

For an example of how PledgeBank will work, imagine the following:

Your child comes home from school with a small flyer given to them by their teacher. It reads:

"I will help organise the school summer fair, but ONLY IF 10 parents will help too. Go to www.pledgebank.com/school if you would like to make the same pledge I have."

When the parent types the address into their web browser, they are taken to a page where within 10 seconds they can put their name to the pledge.

What happens next depends on the parents. Imagine that 12 parents sign up. In this case all signatories are sent an email congratulating them, asking them to fulfil their pledge, and offering them a simple way of discussing the plan. And if fewer than 10 people sign up, everyone is emailed and told "Better luck next time".

I think PledgeBank sounds like an excellent idea.  I know i've felt the isolation of not wanting to act alone.  Maybe this will be a good way to assist communities in helping themselves.

02/03/2005 11:40 by Matt Mower | Permalink | comments: