New! SiteMap xml AutoDiscovery with Robots.txt

Apparently Google, Yahoo and MSN all getting together about agreeing to use the Sitemap xml format to index your web site is having some good fallout.

They've all agreed to look at your robots.txt file (which all crawlers look for) and you can now point to your sitemap url from there, and they will happily chew upon it!

To do this, simply add the following line to your robots.txt file:

Sitemap: http://www.example.com/sitemap.xml

You need to provide the complete URL for your Sitemap on this line. They will pick it up --wherever you put it in your robots.txt file. This directive is not specific to user-agent. If you have multiple Sitemaps, you can point to your Sitemap index file on this line.

Details about the Sitemaps protocol including this addition are available on the protocol website here: http://www.sitemaps.org.

The Google sitemap invention and protocol is a great SEO tool for your web site or blog. We have used this on our eggheadcafe.com site since the inception of sitemaps. On my IttyUrl.net site, i actually have a page that reads and caches my own autogenerated sitemap xml and turns it into a "page of links" - just so non-sitemap bot crawlers have something to consume.

Now that MSN, Yahoo and Google are all digesting this stuff, you've got yourself a triple with the same amount of effort as just getting on to first base.

Comments

  1. Do you think that crawlers would be put off if the file suffix is .ashx as apposed to .xml?

    Thanks,
    Nathan

    ReplyDelete
  2. I really am not sure. Here's the spec:

    http://www.sitemaps.org/protocol.php

    ReplyDelete
  3. Thanks, no mention in the specification that it needs to conform to a specific file name.

    Did you create a job to generate the sitemap.xml file on ittyurl.net or is it generated dynamically?

    Thanks for your help and all your excellent article; you've help me many many times over the years.

    ReplyDelete
  4. It depends on the site. At IttyUrl.Net, all the content that would normally go into a sitemap is in my database. I have a method called from Application_Start in global.asax that checks the last time a sitemap was generated. This then gets the data I need through a complex SQL join statement that is passed to my custom SiteMapGenerator class which generates and deposits a new sitemap.xml file from it.
    Hope that helps.

    ReplyDelete

Post a Comment

Popular posts from this blog

FIREFOX / IE Word-Wrap, Word-Break, TABLES FIX

Some observations on Script Callbacks, "AJAX", "ATLAS" "AHAB" and where it's all going.

IE7 - Vista: "Internet Explorer has stopped Working"