The big search engines (Google, Yahoo, MSN and Ask) introduced the Sitemaps protocol earlier this year.
In its simplest form, a sitemap is an XML file that lists URLs for a site along with additional metadata about each URL: when it was last updated, how often it usually changes, how important it is, relative to other URLs in the site, etc.
That information helps search engines to more intelligently crawl your site. The Sitemaps protocol is a standard that makes it easier to create a sitemap that can be parsed by all search engines.
Some webmasters reported problems with duplicate content after adding a sitemaps XML file to their web sites.
The content of their websites appeared on dubious websites that had nothing to do with the original sites. The content of the original websites had been duplicated on many other sites. The result was that the original sites might have received ranking penalties due to duplicate content.
Some search engine spammers used the sitemaps XML files to easily find contents for their scraper sites.
A scraper site is a website that pulls all of its information from other websites using automated tools. The scraper software pulls different contents from other websites to create new web pages that are designed around special keywords. The scraped pages usually show AdSense ads with which the spammers hopes to make money.
The new sitemaps XML files make it very easy for scraper tools to find content rich pages. Although the original intention of the sitemaps files was to inform search engines about every single page of your web site, they can also be used to inform spam bots about your pages.
- What is the solution to avoid problems with a sitemap file? - May 15, 2007
- What is a sitemaps XML file? Is it harmful for your rankings? - May 15, 2007