RSS - Aggregate the Aggregators?

After quite a bit of peripheral research on RSS (not just what is is, because we've been using it for a long time - more about where the "whole thing is headed"), I decided to try my hand at "aggregating the aggregators". There are enough search engines of various types now that offer to return their results as RSS that it made sense to build a "Multisearch RSS aggregation engine" -- and so I did.

Basically what this little puppy does is take your list of search URLs - which usually have two parts - the main URI part, then your query term(s), and then an optional "part 2" that determines RSS, number of results, etc.

These all take your query and send it out to each engine asynchronously on a threadpool (I like Ami Bar's "SmartTthreadpool" - a really magnificent job). The results are stored in a hashtable with the link as the key so as to be able to easily weed out duplicate links. Then, they are ordered by pubDate most recent first, and returned to the caller as a DataSet - allowing the calling app or page to decide how it wants to present the results. What I do with this search page that uses my "engine" is perform an XSL Transform and is save the result html on the filesystem and keep a list of "Recent search" links right on the page that people can choose from. These, of course, will come up immediately since there is no need to go out to the search engines to get the results. It even inserts up to three "ads" in the result page, and the page automatically deletes searches that were cached and are older than X number of days. The engine portion uses an "EngineState" class that holds all the parameters and uses the XmlSerializer to serialize itself to and from an XML configuration file.

After a trial period and some more tuning and cleanup, I'll probably write an article and make the source code available. Meanwhile, if you would like to preview the RSS Multisearch Engine, here's the location.

RSS is exploding in the last year or so. It's actually kinda scary when you think about it...

Comments are always welcome.

Comments

Popular posts from this blog

Some observations on Script Callbacks, "AJAX", "ATLAS" "AHAB" and where it's all going.

IE7 - Vista: "Internet Explorer has stopped Working"

FIREFOX / IE Word-Wrap, Word-Break, TABLES FIX

System.Web.Caching.Cache, HttpRuntime.Cache, and IIS Recycles

FIX: Requested Registry Access is not allowed (Visual Studio 2008)