Almost brilliant you say?

I was using Amar Sagoo's beautiful utility Tofu to read a reasonably long Word document today and, since I didn't want to hunch over the keyboard while reading, decided I would try the speech control feature.

It's actually pretty good. I changed the default configuration slightly so that it is triggered by the word computer (actually I tried many variations including the momentarily delicious slave) rather than pressing the escape key.

I was then able to read, sitting back comfortably, with Tofu expanded to show three fat columns on my 20" display and move along by saying computer move right whenever necessary.

This is fantastic, it's almost brilliant. Why isn't it actually brilliant though? The first reason is the speech recognition and the second is the lack of PDF support.

MacOSX built-in speech recognition is pretty good but it's just not good enough. In training I found that I couldn't get it to reliably detect two of the phrases. show me what to say was recongized about 15% of the time and open a document about 30%. This despite trying 3 different microphones (the built-in mic, my Telex USB headset, and the iSight mic), 3 different positions, all the recording levels, and a range of voices Rory Bremner would be hard pushed to beat.

This meant that, in practice, it failed to recongize the command move right just often enough to be irritating although move left was fine. By contrast I could hardly get move page left and, especially, move page right to work at all.

However the promise of this kind of speech driven technology really excites me. I got a real buzz out of being able to control the computer this way and was surprised that it didn't seem to spike my CPU. I really hope that the speech recognition is improved in MacOSX 10.5.

The second problem with Tofu was the lack of PDF support. I say was because Amar Sagoo is working on Tofu v2 which uses the PDF-Kit functionality provided in Tiger to get a text stream from PDF's. However, as Sagoo mentions, it's not all plainsailing:

One limitation, however, is that it can't distinguish between line wraps (which occur at the end of each line) and real paragraph breaks. This is because PDF files don't really store continuous text, but rather the position of each character on the page.

He's working around this and it will be interesting to see how good it can get. I dream of being able to sit back and relax while reading e-books using Tofu and voice control.

29/05/2006 19:59 by Matt Mower | Permalink | comments: