permalink.gif 2003-08-17

permalink.gif Abusing Bach

Sun Aug 17 20:28:21 BST 2003  Permalink 

Via a recent Tech Tip i've learned a little bit about Java's sound capabilites and, in particular, using Java to play MIDI.  It was fun typing in a little java program and hearing piano's playing a few notes.  Something to while away a few minutes on a hot afternoon.

Then I started thinking about algorithmically generating music.  I am deuced unmusical (lacking both rhythmn and a good pitch ear) which has lead, in the past, to some frustrating attempts to use professional software to make compositions.  I can program Java though.

I have a program which I use to generate random pronounceable passwords.  Although it's hit rate for generating memorable words is about 1 in 40, I can remember the good passwords some 6 years after first using them.  It works by analysing a body of text and calculating the frequency of each 3 letter combination that appears.  Then it uses some simple rules to combine these 3 letter combinations into words.

So I started to wonder if the same thing could be done with music.  Could you stitch together 3 note combinations into something resembling music? (for the moment let's set aside the question of why on earth would you do this?)

Java has a very simple call:

MidiSystem.getSequence( file )

which loads a MIDI file into an array of Track objects from which you can access the events which play the various notes of the piece.  There are equivalent calls for creating & playing tracks (which use your sound card like a synthesizer).

I started with some Bach that I found on the net.  My aim was to do a frequency analysis of the 3-note combinations.  Somewhat to my surprise though I discovered there weren't any repetitions.  That is, no exact 3 note combination was ever repeated.  At least, unless I got my program wrong.  The resulting noise led my housemates to question whether I was safe to be left home alone.

Not to be discouraged I tried a second approach, analysing, for each note played the range of notes which could follow and probability of each.  This, coupled with some simple selection logic, allows me to play something that sounds almost totally unlike music (and certainly unlike Bach's music).  Mostly it has taught me that music is vastly more complex in structure than words.

Still, it's been a diverting way to spend an afternoon AND i've learned something.

Some other notes.  My development environment is IDEA by Intellij. I've tried pretty much every Java IDE going and this one is the best by far.  If you haven't tried Intention actions yet, well...    The GUI was built using Peter Eastman's Buoy widget set.   Peter is also responsible for the Java based 3D rendering suite ArtOfIllusion.

permalink.gif Validating work blogs.

Sun Aug 17 19:51:00 BST 2003  Permalink 

How I would implement weblog in business. Lee LeFever has written a short article on the value of weblogs to share knowledge. To quote: In retrospect- a Weblog could have been extremely valuable to me and the company. Using a Weblog, I could chronicle the daily activities,... [Column Two]

» More validation of the idea of using weblogs to augment internal communication & collaboration.

permalink.gif Navigate or do not. There are no breadcrumbs.

Sun Aug 17 19:40:09 BST 2003  Permalink 

Breadcrumb navigation: Further investigation of usage. Bonnie Lida Rogers and Barbara Chaparro has summarised the results of their further research into the effectiveness of breadcrumb navigation. To quote: In this study, we designed the tasks such that navigational efficiency would be optimized through the use of... [Column Two]

 

permalink.gif Preacher man

Sun Aug 17 19:33:13 BST 2003  Permalink 

I really enjoyed listening to the interview with Real Live Preacher on Christopher Lydon's weblog.  I'm not even remotely religious but I enjoy the wisdom and compassion in RLP's writing.

permalink.gif K-Collector & Bayesian filtering

Sun Aug 17 18:59:05 BST 2003  Permalink 

Issues in using SpamBayes to filter news items.

Despite a reading an entry by Srijith discussing Bayes-based classification as unsuitable for use in news aggregators, I tied SpamBayes into my homebrew news aggregator and have been trying it out this week. I know I’ve been talking about it for awhile, but procrastination and being busy all round kept me from getting to it. Funny thing is, when I finally got a chance to really check things out, the integration was a snap. I’d anticipated a bit of work, but was pleasantly surprised. I doubt that any other aggregator written in Python would have a hard time with it.

If, that is, anyone else wants to do it. I already knew it wasn’t magic pixy dust but I figured it might be worth a try. I will be eating my dogfood for awhile with this, but I’m thinking already that what’s good for spam might not be so good for news aggregators.

Srijith’s post mentions some snags in ignoring some of the semantics of a news item, such as whether a word appears in the item’s title or information about the item’s source. I don’t think that this completely applies to how I’m doing classification, since SpamBayes appears to differentiate between words found in email headers and the body itself. When I feed an item to SpamBayes for training and scoring, I represent it as something like an email message, with headers like date, subject, from, and an “X-Link” header for the link. However, even with this, I think Srijith’s got a point when he writes that this method will miss a lot of available clues for classification.

Unlike Srijith’s examples, though, I’m not trying to train my aggregator to sift entries into any specific categories. So far, I’ve been trying to get it to discriminate between what I really want to read, and what I’m not so interested in. So, I figured that something which can learn the difference between spam and normal email could help. But, although it’s early, I’m noticing a few things about the results and I’ve had a few things occur to me.

See, in the case of ham vs spam, I really want all the ham and none of the spam. A method to differentiate between these two should be optimized toward one answer or the other. SpamBayes offers “I don’t know” as a third answer, but it’s not geared toward anything else in-between. However, in measuring something like “interest“, inbetween answers are useful. I want all of the interesting stuff, some of the sort-of interesting stuff, and a little of the rest.

This is also a problem for me in deciding to what I should give a thumbs up and what gets the thumbs down. Even though I’ve subscribed to a little over 300 feeds, every item from each of them is somewhat interesting to me. I wouldn’t have subscribed to the feed if there wasn’t anything of interest there, so I’ve already biased the content of what I receive. Some items are more interesting than others, but the difference between them is nowhere near the difference of wanted ham vs unsolicited spam. So, I find myself giving the nod to lots of items, but only turning down a few. SpamBayes would like equal examples of both, if possible.

I’ll still be playing with this for awhile, but I need to look around at other machine learning tech. I’m just hacking around, but the important thing is to try to understand the algorithms better and know how they work and why. Bayes is in vogue right now, but as Mark Pilgrim intimated, it’s not magic. It’s just “advanced” :)

In the immortal words of Mark Jason Dominus: “You can’t just make shit up and expect the computer to know what you mean, retardo!”

[0xDECAFBAD]

Interesting.  Within k-collector we already have a method for selecting a level of interest more granular than the feed and that is the topic (and, soon, groups of related topics).

This allows you to say I'm interested in Java rather than I want to read these 200 blogs where they talk about Java sometimes.  Then within this view you could start to say well, so and so is more interesting than Matt on this topic.  But again, you are only dealing with the topic at hand.  You might still think I'm more interesting about something else even if I struggle for an example!

We're already looking at interesting things we can do with this approach, maybe Bayesian filtering is something we should be thinking about.