<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:ent="http://www.purl.org/NET/ENT/1.0/" version="2.0">
  <channel>
    <title>Curiouser and Curiouser! on neural-networks</title>
    <link>http://matt.blogs.it/</link>
    <description>RSS feed for topic neural-networks</description>
    <copyright>Copyright 2006 Matt Mower</copyright>
    <generator>Squib/0.4.0.348</generator>
    <managingEditor>self@mattmower.com</managingEditor>
    <webMaster>self@mattmower.com</webMaster>
    <language>en-gb</language>
    <item>
      <title>Learning artificial learning</title>
      <link>http://matt.blogs.it/entries/00001752.html</link>
      <pubDate>Thu, 17 Mar 2005 23:12:05 +0000</pubDate>
      <description>&lt;p&gt;Any AI gurus out there?&lt;/p&gt;
&lt;p&gt;I'm considering the problem of using a neural network to evaluate chunks of text and determine whether they are interesting or not and it's taxing me a little.&lt;/p&gt;
&lt;p&gt;My theoretical approach is to take an item and a manually assigned score.  Break the item into a series of keywords and then supply the keywords (possibly with order) to the network using backpropagation to train it against the manually assigned score.&lt;/p&gt;
&lt;p&gt;Leaving aside the training issues for a second I have a much bigger problem: How to supply input to the network.&lt;/p&gt;
&lt;p&gt;In all of the books I am reading the examples effectively use a fixed input size.  For example in image recognition you are sampling a fixed pixel space.  In control applications there are a number of inputs sampling a pre-determined set of physical systems.  However I'm not clear how to map that to my situation.&lt;/p&gt;
&lt;p&gt;Forgetting any attempts to train the network on keyword order or keyword &lt;em&gt;strength&lt;/em&gt;, how can we even supply keyword information as inputs when the set of keywords is potentially unbounded?&lt;/p&gt;
&lt;p&gt;The closest example I have seen (so far) to my situation is called &lt;a href="http://www2.psy.uq.edu.au/CogPsych/acnn96/case7.html"&gt;NETtalk&lt;/a&gt;  which is a system for pronouncing English words.  NETtalk was trained using a sample dictionary of 5,000 words with the phonemes corresponding to each letter.  The training process used a sliding window of 7 characters where the network could look at the current character in the context of the 3 preceding and following characters. The output from the network is the phoneme to represent the current character.&lt;/p&gt;
&lt;p&gt;Attempting to adapt such a scheme to the keyword analysis situation we might decide on a maximum possible number of keywords (let's say 4,096) which will fit into 12 bits (2^12 = 4096).  Each keyword would then be given a unique id (I do this already) in the range 0-4095.  The network could then have a 12&lt;em&gt;n&lt;/em&gt; bit wide input layer corresponding to a moving window of &lt;em&gt;n&lt;/em&gt; keywords.&lt;/p&gt;
&lt;p&gt;However an obvious problem raises it's head at this point.  NETtalk was outputing a phoneme for each character and using surrounding characters for context.  My situation each keyword contributes to the overall score of the article.  To my way of thinking the two situations do not match up.&lt;/p&gt;
&lt;p&gt;Ideally you would simply supply &lt;strong&gt;all&lt;/strong&gt; the keyword information for the article, however the number of keywords is not bounded which creates an input problem I don't know how to resolve.&lt;/p&gt;
&lt;p&gt;Basically, I need help.  Can anyone assist me?&lt;/p&gt;
</description>
      <guid isPermaLink="true">http://matt.blogs.it/entries/00001752.html</guid>
      <ent:cloud ent:href="http://matt.blogs.it/topics/">
      </ent:cloud>
    </item>
    <item>
      <title>I need a network to help me understand all these networks</title>
      <link>http://matt.blogs.it/entries/00001753.html</link>
      <pubDate>Sat, 19 Mar 2005 01:09:54 +0000</pubDate>
      <description>&lt;p&gt;I posted some &lt;a href="http://matt.blogs.it/2005/03/17.html#a1752"&gt;questions&lt;/a&gt; yesterday about using training a neural networks to learn how to classify interesting posts.  &lt;a href="http://radio.weblogs.com/0100875/"&gt;Mikel&lt;/a&gt; kindly pointed me at &lt;a href="http://odur.let.rug.nl/~kleiweg/kohonen/kohonen.html#alg"&gt;Kohonen maps&lt;/a&gt;.  This a vector based technique for clustering related items on a &lt;em&gt;surface&lt;/em&gt;.  In this respect it sounds similar to &lt;a href="http://en.wikipedia.org/wiki/Multidimensional_scaling"&gt;Multi-dimensional scaling&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Latent_Semantic_Indexing"&gt;Latent Semantic Indexing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I guess my biggest problem is trying to evaluate so many techniques, which are new to me, in so little time.  It's a challenge.&lt;/p&gt;
&lt;p&gt;I haven't given up on the idea of using a backpropagation network either.  It occurred to me that one solution to the input problem would be to reduce each item to a set of &lt;em&gt;n&lt;/em&gt; ranked keywords, i.e. the top 10 rated keywords.  This would provide a fixed vector for input to the network.  I'm thinking this might be simple enough to implement and effective enough to meet the 80/20 rule.&lt;/p&gt;</description>
      <guid isPermaLink="true">http://matt.blogs.it/entries/00001753.html</guid>
      <ent:cloud ent:href="http://matt.blogs.it/topics/">
      </ent:cloud>
    </item>
    <item>
      <title>Maybe I didn't need a network to predict this!</title>
      <link>http://matt.blogs.it/entries/00001755.html</link>
      <pubDate>Thu, 24 Mar 2005 18:34:17 +0000</pubDate>
      <description>&lt;p&gt;Neural networks are very cool but not suitable for all applications.  I've basically been stumped by the problems of trying to use a network for indentifying interesting weblog posts.&lt;/p&gt;
&lt;p&gt;The first problem is the input problem.  How do you represent arbitrary chunks of text to the network in a meaningful way?&lt;/p&gt;
&lt;p&gt;The problem here is that you have a layer of &lt;em&gt;input neurons&lt;/em&gt;  which form the input to the network.  The inputs are driven by the environment (e.g. the text) and must consist of real values which can be fed to the next (hidden) layer of the network.  If you're measuring temperatures, voltages, water levels, and so on then you are working with real values already.  If you're working in image recognition you tend to have a fixed array of pixels (e.g. 640x480). But what about text?&lt;/p&gt;
&lt;p&gt;I find myself presented with two sub-problems:&lt;ol&gt;&lt;li&gt;&lt;p&gt;The length of the text is not finite.  Measuring temperature you might have 2 or 3 sensors.  A weblog post could as easily be 5 words or 50,000 words depending upon the authors whim.  Although, in practice, you could say &lt;em&gt;"no post will ever be more than 1MB of text"&lt;/em&gt; and treat 1MB as a limit that can create it's own problems.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Text doesn't neatly correspond to a real value input pattern.  How do you represent the specific text as a numeric value?&lt;p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/p&gt;
&lt;p&gt;An approach I formulated was to chop the input into keywords (doing appropriate stop-word rejection and so forth) and then feed the most relevant &lt;em&gt;n&lt;/em&gt; keywords to the network as input for that text.  The keywords could be uniquely numbered and then represented as a binary value.  Each bit of the binary value would correspond to an input cell raising a value of 0.0 or 1.0 depending upon whether the bit is set or not.  If we allowed a total of 4,096 possible keywords this can be represented in 12 bits (2^12=4096).  If we used the 10 most relevant keywords for each post, i.e. n=10, then the input layer would, therefore, be composed of 12&lt;em&gt;n&lt;/em&gt; or 120 cells.&lt;/p&gt;
&lt;p&gt;However even having reached this point there are further problems to consider:&lt;ul&gt;&lt;li&gt;&lt;p&gt;In a large post there may well be more than 10 keywords which means losing relevant information.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;In a small post there may be less than 10 relevant keywords.  What input is provided for non-existent keywords?&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The relevance of the keyword to the item is not encoded.  This problem might be solved by adding further input cells for each keyword to express keyword relevance.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The network acts as a &lt;em&gt;feature detector&lt;/em&gt;.  If we considering a set of temperature sensors wired to the inputs, each sensor will be wired to a specific set of inputs, they won't change.  However a keyword that is detected in one position (i.e. represented in one set of input cells) for one item may be detected in another position for a different item and won't be considered the same feature. This is likely to be problematic.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;
&lt;p&gt;Basically, when it comes to free text, &lt;strong&gt;input is a mess&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;And then there are pragmatic problems like:&lt;ul&gt;&lt;li&gt;&lt;p&gt;"How big should the hidden layer be?"&lt;/p&gt;&lt;p&gt;Too small and the network won't learn, too big and the network will &lt;em&gt;mimic&lt;/em&gt; rather than learning to generalize properly and will also be slow.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;"Should we have one hidden layer or two?"&lt;/p&gt;&lt;p&gt;As a general rule of thumb it appears that in 85% of cases 2 layers works best with 3 layers performing better in the rest.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;"What training rate should be used?"&lt;/p&gt;&lt;p&gt;Set too high and the network &lt;em&gt;bounces&lt;/em&gt; around unable to settle on a solution, set too low and it never converges on useful behaviour&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;
&lt;p&gt;Each of these problems is solvable but usually involves trial and error searching by training the network, deciding whether it is effective and, if not, junking it and trying some different combination of parameters.  This is fine if you have a fixed training set with which you train the network repeatedly and then, having found the best parameter combination, just use it from there on.&lt;/p&gt;
&lt;p&gt;However, in my application, the training is done by the user interactively and the training set will be different for each of them, and will change over time.  Although the accumulated training data could be stored and the network retrained in the background I think this could mean that the network would never be &lt;strong&gt;useful&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;All of which is leading me to think (as others have advised me) that a Neural Network may not be the best solution to this particular problem.  I'm interested in whether anyone has any successful experience in this area.  Otherwise I'm probably going to start looking more closely at Bayesian classifiers.&lt;/p&gt;</description>
      <guid isPermaLink="true">http://matt.blogs.it/entries/00001755.html</guid>
      <ent:cloud ent:href="http://matt.blogs.it/topics/">
      </ent:cloud>
    </item>
    <item>
      <title>A genetic theory of interest</title>
      <link>http://matt.blogs.it/entries/00002194.html</link>
      <pubDate>Sun, 23 Apr 2006 09:42:03 +0100</pubDate>
      <description>&lt;p&gt;I was thinking about the problem of training neural networks to recognize interesting &lt;em&gt;things&lt;/em&gt; (for example weblog posts). A neural network is a graph of weighted associations where, during training, the weights are adjusted relative to the trainers responses.&lt;/p&gt;

&lt;p&gt;Recently I've been reading some interesting articles about &lt;a href="http://en.wikipedia.org/wiki/Genetic_programming"&gt;Genetic Programming&lt;/a&gt; and, particularly, the work of &lt;a href="http://www.genetic-programming.com/"&gt;John Koza&lt;/a&gt; (If I was going back to University to start again in computing, I would be looking to work in the field of GP). GP is based upon &lt;a href="http://en.wikipedia.org/wiki/Genetic_algorithm"&gt;Genetic Algorithms&lt;/a&gt; which &lt;em&gt;evolve&lt;/em&gt; a fit solution to a problem.&lt;/p&gt;

&lt;p&gt;So it occurred to me that you could maybe use a genetic algorithm to evolve the network of weights to use in a neural net and maybe the GA could do it faster and produce a better network than haphazard training by individuals. A quick search of Google reveals that this is &lt;a href="http://www.generation5.org/content/2000/nn_ga.asp"&gt;not a new thought&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The key problems appear to be how to represent the weights, how to do the cross-over operation, and (and this is the one that has me stumped) what is the fitness function? A GA uses its fitness function to evaluate the current generation and decide which genes are going to survive into the next generation.&lt;/p&gt;

&lt;p&gt;I have no answer to how to come up with an objective function for determining &lt;em&gt;interest&lt;/em&gt;. Clearly if we had such a function we wouldn't need the neural network in the first place. We could just feed every post straight to the interest function and see what it says.&lt;/p&gt;

&lt;p&gt;Which leads back to the &lt;em&gt;by example&lt;/em&gt; method and using a corpus of interesting and uninteresting posts which can be fed in to see what the neural network comes up with. In this case we are swapping GA for back propagation. The article I referenced earlier suggests that this might produce a better network (in terms of output results) at the cost of being orders of magnitude slower than back propagation.&lt;/p&gt;

&lt;p&gt;Anyway I'm going to look into this area more closely because I find the whole thing fascinating.&lt;/p&gt;</description>
      <guid isPermaLink="true">http://matt.blogs.it/entries/00002194.html</guid>
      <ent:cloud ent:href="http://matt.blogs.it/topics/">
      </ent:cloud>
    </item>
  </channel>
</rss>
