Yahoo, Google and Microsoft Team Up on Sitemaps

Yahoo, Google, and Microsoft have all announced that they’ve agreed to set a standard for sitemaps.

DiggSpeak Translation: "Amazing! Top Ten Reasons to use Sitemaps"

Sitemaps are those XML files that list all the pages on your Web site. Search engines like to have all the listings in one place so that a site can be indexed without anything being missed.

The protocol has now been released under Creative Commons, so any search engine can pick up on it if they like.

Most webmasters / developers and web site owners use sitemaps, and there is plenty of sample code to generate these dynamically.

We use sitemaps on our Eggheadcafe.com site, and I believe they result in much better indexing. Plus, you can specify how often the bots should crawl, and what the priority is of each item. For more complex sites, you can have a SiteMapIndex file in your website root, which has entries that point to any number of other individual sitemap files. So for example, you might have a messageboard or Forum section and create a separate sitemap for that out of your database daily. Then, you might have another sitemap for your "regular" content such as articles, that gets updated weekly. Your index file would point to both of these, and the bots will happily crawl them.

Sitemap files can also be GZipped, which cuts down on bandwidth and the bots can load them faster. With .NET, just use the System.IO.Compression namespace.

The nice thing about sitemaps is this - Bots only know how to do one thing: follow links. They can't follow Javascript; they can't follow images, and they can't follow dropdownlists of links either. It has to be an anchor "A" tag, with an "href" attribute to make the spiders happy. Yep, they are just "reallyreallydumb".

If you have content in your site that's not linked to from another page, or which only comes up because of a database search, it's not likely to get indexed at all. But if you put the url into a sitemap file, the bots will find it, crawl, and index it. Think about it - you may have content that is dynamically generated out of a database, and doesn't sit on the filesystem at all. With the correct sitemap element, that content, which ordinarily would be invisible to the spiders, will be successfully crawled and indexed by the major search engines. That means more hits, and more revenue if you serve advertising.

Sitemaps are your friend. Now, with Yahoo and Microsoft on the bandwagon, they'll be more important than ever.

Yahoo is expected to begin using your sitemap(s) on Thursday, with Microsoft picking them up early next year.

Comments

  1. I bet sidemaps aren't at all important in any way for anybody who isn't hopelessly unalive.

    ReplyDelete
  2. Sitebases, the next protocol after Sitemaps
    It can save the time and press for the search engine, also for the websites.
    It can bring new search engine that named Search Engine 2.0.
    Using Sitebases protocol, will save 95% bandwidth above. It is another sample for long tail theory.
    In this protocol, I suggested that all search engine can share their Big Sitebases each other, so the webmaster just need to submit their Sitebases to one Search Engine.
    And I suggested that all search engines can open their search API for free and unlimited using.

    Please visit: http://www.sitebases.org

    ReplyDelete
  3. Hong,
    I looked at your post and I didn't see any protocol, or API, only a short description. You don't explain how "SiteBases" can save 95% bandwidth.

    Moreover, what incentive do big search engines, which all compete with each other for advertiser dollars, have to share their result databases with each other?

    Finally, most big search engines have already opened their APIs for use by developers. Not "Unlimited" use, but enough to do some good things.

    Maybe you could follow up with some detailed responses to my 3 points?

    ReplyDelete
  4. Hi Peter.
    1.For "about", please click http://www.sitebases.org/about-sitebases/; for protocol http://www.sitebases.org/protocol/. It is almost the same with sitemaps.org. The search engines maybe treat http://www.sitebases.org as mirror as http://www.sitemaps.org but their different. Just like checks, the same looking. But one is 100USD, another is 10000USD.
    2.My target is to save 95% hardware cost not only bandwidth cost. The webmasters submit their sitebases including whole content of their websites. The search engine just need to select some pages to check if same as Sitebase. But no need to download whole site. Please see http://www.sitebases.org/protocol/. That will be help you to understand my ideas.
    3.Sharing the big sitebases will change all. I just said "should" not "must". It is new idea that can save much more money. No matter how many search engine want to protect their Sitebases as private, the newer will open and comes to the next.
    4.The advertiser will use more platform like adsense, adwords. These platform can be new field.
    5.Yes, now the limited API still enough. But sometime back busy not as original search engines.

    All will change.

    ReplyDelete
  5. I like to talk with you for this topic.

    ReplyDelete
  6. Anonymous2:28 PM

    My email is hongxiaowan(at)gmail.com
    I am glad to discuss some topic with you. Thanks a lot.

    ReplyDelete

Post a Comment

Popular posts from this blog

Some observations on Script Callbacks, "AJAX", "ATLAS" "AHAB" and where it's all going.

IE7 - Vista: "Internet Explorer has stopped Working"

FIREFOX / IE Word-Wrap, Word-Break, TABLES FIX

System.Web.Caching.Cache, HttpRuntime.Cache, and IIS Recycles

FIX: Requested Registry Access is not allowed (Visual Studio 2008)