11/21/2006

Yahoo, Google and Microsoft Team Up on Sitemaps

Yahoo, Google, and Microsoft have all announced that they’ve agreed to set a standard for sitemaps.

DiggSpeak Translation: "Amazing! Top Ten Reasons to use Sitemaps"

Sitemaps are those XML files that list all the pages on your Web site. Search engines like to have all the listings in one place so that a site can be indexed without anything being missed.

The protocol has now been released under Creative Commons, so any search engine can pick up on it if they like.

Most webmasters / developers and web site owners use sitemaps, and there is plenty of sample code to generate these dynamically.

We use sitemaps on our Eggheadcafe.com site, and I believe they result in much better indexing. Plus, you can specify how often the bots should crawl, and what the priority is of each item. For more complex sites, you can have a SiteMapIndex file in your website root, which has entries that point to any number of other individual sitemap files. So for example, you might have a messageboard or Forum section and create a separate sitemap for that out of your database daily. Then, you might have another sitemap for your "regular" content such as articles, that gets updated weekly. Your index file would point to both of these, and the bots will happily crawl them.

Sitemap files can also be GZipped, which cuts down on bandwidth and the bots can load them faster. With .NET, just use the System.IO.Compression namespace.

The nice thing about sitemaps is this - Bots only know how to do one thing: follow links. They can't follow Javascript; they can't follow images, and they can't follow dropdownlists of links either. It has to be an anchor "A" tag, with an "href" attribute to make the spiders happy. Yep, they are just "reallyreallydumb".

If you have content in your site that's not linked to from another page, or which only comes up because of a database search, it's not likely to get indexed at all. But if you put the url into a sitemap file, the bots will find it, crawl, and index it. Think about it - you may have content that is dynamically generated out of a database, and doesn't sit on the filesystem at all. With the correct sitemap element, that content, which ordinarily would be invisible to the spiders, will be successfully crawled and indexed by the major search engines. That means more hits, and more revenue if you serve advertising.

Sitemaps are your friend. Now, with Yahoo and Microsoft on the bandwagon, they'll be more important than ever.

Yahoo is expected to begin using your sitemap(s) on Thursday, with Microsoft picking them up early next year.