Bayesian Filtering in Aggregators?

Jim O'Halloran • May 26, 2003

my-blog-blogging

I blogged this mainly because I wanted a reference to the original "A Plan for Spam" paper on Bayesian Filtering, and the follow up "Better Bayesian Filtering" , both by Paul Graham.

I've been toying with the idea of writing my own news aggregator for a while because I've yet to find something thats exactly what I want. But then my mind runs away with an idea. With most current aggregators, you read the entire feed, or none of it... If an email client (once trained) can use Bayesian classifiers to distinguish between spam, and real email, why couldn't an aggregator use a similar idea to classify posts in an RSS feed?

Starting with my current subscriptions list, allow me to show the aggregator which articles/posts in the feeds I do like. and those that I don't. Using from that information, begin to filter the feeds to hide some articles. For example I like to read whatever Jeremy writes about MySQL , flying , or search engines but I'm less interested in his test posts.

Perhaps, once sufficiently trained, the aggregator could start to use Feedster to locate other stuff I might be interested in, and add that into the subscription/filtering process.

I guess to sum it up in one sentence, the perfect aggregator should customise itself to show me what I want to read, then go find new stuff for me to read once it knows what I want to read.

Dunno if I'll ever get time to write something like that, but the idea's free for anyone to pick up and use. If anyone does this let me know and I'll point to it. If you happen to use it in a commercial product though I'd appreciate a free licence so I can play with the end result :)