permalink.gif 2004-10-23

permalink.gif Bayesian weblog detector?

Sat Oct 23 12:20:38 BST 2004  Permalink 

Guessing if a link leads to a weblog or not?.

Technical weblog research question:

I have a list of links and I'd like to find out which of them lead to weblogs. Is there a way of doing this automatically?

Things that I thought about:

  • guessing from url - would work for weblogs hosted in most popular platforms
  • check if there is RSS/Atom feed - would exclude weblogs without feeds and include general sites with RSS feeds
  • match url against database of any weblog indexing site - would include only subset of weblogs and you have to get the database first
  • ...

Do you have any suggestions?

This post also appears on channel weblog research

[Mathemagenic]

I thought of various approaches to this one involving looking for tags pointing to RSS feeds (Nope: the BBC correctly do that on their news pages), looking for author-information in the RSS, and so on.  None of them would be foolproof and all would be a pain to implement with lots of edge cases.

I think that if it was my problem then I would make like Jon Udell and rig up a Bayesian categorizer: train it on some weblogs and likely looking non-weblogs and then feed it the full data set and see what happened.

permalink.gif Pure Geek Pleasure

Sat Oct 23 11:55:13 BST 2004  Permalink 

Non-programmers and programmers with experience of dynamic languages will probably both go "huh?" (for different reasons) and should probably skip this one but I wanted to record something I just did in Ruby which I found very cool.

Something I do from time to time in Java is use delegation. This can be a real PITA since it usually involves creating an interface for the methods you want to delegate, stub methods to forward calls, and code to manage the delegate instance. And this all has to be maintained. The solution I cooked up in Ruby blows the part of my mind that programs in Java.

When you start using Ruby you learn that all instance variables are private. In order for other objects to access them you create accessor and mutator methods. This is a pretty common idiom, C# offers properties which do the same job. In Ruby you do:
attr_reader :first_atribute, :second_attribute, :and_so_on
And attr_reader magically creates new accessor methods called first_attribute(), second_attribute(), and_son_on().

However the real magic came when I realised that attr_reader is not a special method generating keyword but just a plain 'ol method call: Ruby methods can extend existing classes with new methods! I immediately started to wonder: Could I do the same for delegating method calls?

What followed was a lot of help from the great crowd in #ruby-lang and quite a bit of learning about Ruby but I am now able to delegate method calls using exactly the same syntax and with as little effort as creating accessors:
delegate_methods :first_method, :second_method, :and_so_on { Delegate.new }
This is all that is required to delegate those method calls and manage the delegate object -- this feels totally cool to me! Like Lisp, Ruby is a language that offers the flexibility to change the language to suit your style and the type of problems you are solving.
"When I am working on a problem I never think about beauty. I only
think about how to solve the problem. But when I have finished, if the
solution is not beautiful, I know it is wrong."
- R. Buckminster Fuller
I think this is why I am liking Ruby so much. Many of the solutions come out beautiful first time!

BTW my Ruby delegate code is here. Please feel free to critique my style. (P.S. I was aware that the Ruby library already has delegation but I got interested in learning how to solve the problem myself)

permalink.gif abXULutely fabulous

Sat Oct 23 11:13:58 BST 2004  Permalink 

[23/10/2004@11:05] okay, this is damn cool: http://www.faser.net/mab/chrome/content/mab.xul
[bitserf in #ruby-lang]
I'm amazed at how good a XUL interface can be. This is damn cool.  I'm hoping there is a Ruby interface to XUL since it looks like it could be a credible alternative to native toolkits like wxRuby, Ruby/tk, and FXRuby.   Although using Ruby to drive Flash GUI interfaces also looks pretty cool.

permalink.gif Slick Wiki

Sat Oct 23 01:54:51 BST 2004  Permalink 

What could be a better thing to be doing at twenty to two in the morning than messing with Wiki software?

I've just managed to get MediaWiki (the software that runs Wikipedia) running on my Win2K box under IIS. According to the documentation this is about as unrecommended a configuration as you can get away with and have any chance of it working.

In fact the installation was pretty straightforward since I had already got PHP working with IIS for another project. I did need to monkey with a new root account in MySQL so that the installer could create it's database and I had to mess with Setup.php because for some reason the REQUEST_URI parameter (which is being dumped for debug purposes) doesnt seem to be defined. All in all though one of the less painful LAMP=>WAMP jobs.

Seems to be working well now though, even if it does creak along on my old Pentium II with 384Mb of memory. My next task is to figure out how to turn off page editing for non-logged in users.

Update: Easy just turn on whitelistedit mode.

permalink.gif Quickipedia

Sat Oct 23 00:56:21 BST 2004  Permalink 

I find myself looking stuff up in Wikipedia more and more often but the front-page often loads really slowly. So I created myself a bookmarklet which prompts for a search term and then opens the Wikipedia search in a new browser window.

Caveat emptor: The bookmarklet is tested only in Mozilla 1.7. Although I try to set the focus on the new window it seems to be ignoring that call so that the new window is created in the background. Also the designers of Mozilla, in their infinite wisdom, decided that Javascript cannot create tabs so we have to open a new window for the search.

Once again I used the excellent Bookmarklet Builder.