2

For example, the Robots Exclusion Protocol specifies /robots.txt as the filepath.

Does the sitemap protocol specify, eg, /sitemap.xml as the default filepath?

If not, why not? For my personal website, I'd prefer to go with convention over configuration and skip the step of informing search engine crawlers of my sitemap's location.

ma11hew28
  • 115
  • 3

3 Answers3

4

The answer is in the protocol you link to:

The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.

There is not one canonical location because depending on where the file is located, it can only describe content "below" it. So from this definition, /sitemap.xml is the default canonical path for a sitemap that describe URLs "anywhere" on the website.

Note that this relative uncertainty exists because all of this started before IETF created a specific standard to define where to put "well known" files in some web filesystem. See Well-Known Uniform Resource Identifiers (URIs)

Patrick Mevzek
  • 8,590
  • 1
  • 22
  • 42
3

Search engines don't typically check for a sitemap at a particular URL. You need to do the step of informing search engines about your sitemap. The easiest way is to add it to robots.txt. It is just one line and then all search engines will pick it up.

The most common location is to have the Sitemap at the root of your site at /sitemap.xml. If your sitemap is compressed using gzip, the convention is to have the URL as /sitemap.xml.gz. I typically choose one of those and redirect the other one to it. However, search engines don't just check those URLs to see if you have a sitemap as far as I know.

There are several reasons that you have to tell search engines about your sitemap:

  • It is possible (and fairly common) to have multiple sitemaps for different sections of your site. For example you may have a WordPress blog that generates its own sitemap and a separate sitemap for the rest of your site.
  • When sitemaps get large they need to get split. You may need to tell search engines about each of the parts, or create a sitemap index file.
  • It is sometimes desirable to create additional temporary sitemaps for specific purposes. For example, to get Google to crawl freshly created or deleted pages quickly.
  • Some sites like to hide their sitemap and only make it available to search engines and not to other crawlers and content scrapers. They make up a hard to guess URL and submit the sitemap through search engine consoles rather than put it at the default location or list it in robots.txt.

You probably don't need a sitemap at all really. Search engines will be able to crawl most sites just fine without one. As long as you link to all the pages on your site from other pages, a sitemap isn't needed. Sitemaps don't help with rankings and often Google will choose not to index pages that are just included in a sitemap if they don't have any links on the site. See The Sitemap Paradox.

The main benefit to a sitemap is seeing extra stats in Google Search Console. To get those stats, you need to add the sitemap through search console, rather than specifying it in robots.txt.

Stephen Ostermiller
  • 99,822
  • 18
  • 143
  • 364
-3

No, I don't think the sitemaps protocol specifies a default sitemap filepath, and I don't see why not, so perhaps it should.

ma11hew28
  • 115
  • 3