Questions tagged [xml-sitemap]

XML Sitemaps are files that list all the important URLs on a website so that search engine crawlers can efficiently and fully crawl a website.

XML Sitemaps are files that list all the important URLs on a website so that search engine crawlers can efficiently and fully crawl a website. The specification for them lives on sitemaps.org.

An example of a simple XML sitemap is:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://www.example.com/</loc></url>
<url><loc>https://www.example.com/page.html</loc></url>
<url><loc>https://www.example.com/another-page.html</loc></url>
</urlset> 

The sitemap for a site is typically named "sitemap.xml" and is served from the root directory of the site (https://www.example.com/sitemap.xml). However, that is just a convention and sitemaps can have any file name. See Does name of sitemap file that Wordpress generates matter? Some bots may look for the sitemap at its typical name and find it automatically. If a sitemap has a different name, it will need to be submitted to search engines through webmaster tools, or by pointing to it from robots.txt like:

User-agent: *
Disallow:

Sitemap: http://www.example.com/example-sitemap.xml

Sitmaps can be compressed using gzip. Because they are often very large, it is very common to find sites serving compressed sitemaps like sitemap.xml.gz.

In addition to the URLs, other data can be included in XML sitemaps: last modified dates (lastmod), change frequency (changefreq), priority, alternate language URLs (hreflang), video URLs, and image URLs. All of those extra fields are optional. Some of them (like "last modified" and "change frequency") are never worth including because Google says it doesn't use them. See How important is it to include <lastmod> in a sitemap?

Sitemaps are limited to 50,000 URLs and 50MB. Sites with more URLs than that will need to use multiple sitemaps. Those sitemaps can be submitted to search engines individually or included in one "sitemap index" file which is submitted to search engines. See Google Sitemap Limits?

Despite broad search engine support, sitemaps have surprisingly little impact on SEO. Search engines don't promise to index every URL in a sitemap, and in fact usually won't index a URL which can be found is through a sitemap. Sitemaps don't help with search engine rankings either. At best, sitemaps get a site fully crawled by search engine bots, give extra stats in webmaster tools, and tell search engines about your preferred URLs. See The Sitemap Paradox.

The best way to generate a sitemap is to use a program to list all the URLs of the website from the file system or database. Popular content management systems (like Drupal, and WordPress) have plugins that can do just that. Generating a sitemap by crawling your website is not recommended. If crawling your site can list all the pages, search engines will be able to crawl the whole thing with or without the sitemap. Sitemaps are typically regenerated automatically on a daily or weekly basis by an automated process scheduled to run on the web server.

When generating a sitemap, all the URLs on the site that have content should be included. Search engines don't want sitemaps to include error pages, alternate URLs, or redirects.

366 questions
271
votes
20 answers

The Sitemap Paradox

We use a sitemap on Stack Overflow, but I have mixed feelings about it. Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs…
Jeff Atwood
  • 14,002
  • 18
  • 66
  • 79
73
votes
5 answers

Are there any clear indicators that my sitemap file is beneficial?

I have recently created a sitemap.xml file and uploaded it to my Google Webmasters Tools account. Google didn't report any issues or errors with the uploaded sitemap of my site. Now my question is: How do I know if my sitemap is working within…
user1420373
30
votes
4 answers

Prevent XML sitemaps from showing up in Google search results

How do I prevent my XML sitemap files from showing up in Google search results like this result of a site: search query: I don't understand why Google would choose to show sitemap files in search results to begin with. These files are not meant…
Stephen Ostermiller
  • 99,822
  • 18
  • 143
  • 364
16
votes
2 answers

Should I Include RSS Feed URLs in an XML Sitemap?

Simple question - Should I Include the RSS Feed URLs in my site's XML Sitemap? I am NOT asking whether I should USE an RSS Feed AS an XML sitemap, but rather, should I include the URLs to the various RSS feeds on my site in my XML Sitemap? I am…
GWR
  • 415
  • 3
  • 13
13
votes
7 answers

How to hide my XML Sitemap from competitors but not from search engines

I want to hide my sitemap XML file from all but allow access from search engines. What is the way to do it? I want to hide the depth of site's content from competitors.
AgA
  • 1,438
  • 3
  • 13
  • 29
12
votes
4 answers

Robots.txt vs Sitemap -- Who wins in a Conflict

If I block off the directory /foo in robots.txt, but my xml sitemap contains URLs with /foo, will the URLs in the sitemap get picked up by Google and other search engines? In other words, does the sitemap trump robots.txt? I think so, but am not…
Nathan
  • 385
  • 3
  • 7
12
votes
3 answers

Is incorrect to have the HTTPS version of the sitemaps.org URL in the xmlns sitemap schema?

I have the schema with this: Is it correct or should be: if all my web pages are on HTTPS?
Adrian Godoy
  • 358
  • 2
  • 9
11
votes
1 answer

Internal links to pages vs. sitemap.xml links to pages

I have a page that is not linked to from anywhere, but it is listed in sitemap.xml. It will be crawled and display in the SERP. Would it be preferable, from a SEO perspective, to have the page linked within the natural flow of the site? For example,…
nathanziarek
  • 211
  • 1
  • 4
11
votes
1 answer

Proper sitemap.xml setup

I have a dynamic site which has many (well, less than 50) users. Each user is allowed to create as many pages as they want. I know that there is a limit to how many pages you can be listed in sitemap.xml, and for now I am under that limit, but I…
Mike
  • 890
  • 5
  • 8
9
votes
1 answer

Should I ping to Google sitemap-index.xml or only the sitemap file that has been modified?

I have a sitemap-index.xml with 7 files. After updating one of the 7 with new content should I ping Google with the index file or with the updated file? In my index I also use lastmod so Google can see when downloading it which item is newest.
8
votes
2 answers

Is it possible to make an XML sitemap pretty enough to show to users?

I wouldn't mind showing my XML sitemap to users if there were a way to make it human usable as opposed to just machine readable. Is it possible to: Choose the colors and layout Make the locations into links Allow sorting based on the field…
Stephen Ostermiller
  • 99,822
  • 18
  • 143
  • 364
8
votes
2 answers

Why does my sitemap have a ".gz" extension, and how can I edit it?

An XML sitemap generator plugin for WordPress puts the following strings in my robots.txt file: User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ But other WordPress blogs have lots of tags included in it. Also, my XML file's sitemap…
LLL
  • 191
  • 1
  • 1
  • 2
8
votes
1 answer

Why important/big sites don't include a sitemap?

I've been looking at most important sites and none of them have a sitemap.xml defined. My list includes: Amazon Ebay Yahoo Microsoft ... and others But none of them include a sitemap.xml file. Why is this? It's not that important? Note: I've been…
8
votes
3 answers

Are there specific advantages or disadvantages of an XML sitemap over a TXT based sitemap?

Are there any specific advantages or disadvantages in generating an XML based sitemap in place of a simple txt based sitemap (list of URLs)? I realize that in the XML format I can set priority and last-modified date, but it is not clear what…
cboettig
  • 325
  • 2
  • 10
8
votes
1 answer

Can you request refresh of the sitemap in Google Search Console?

TL;DR My Sitemap shows a last crawl date of 2-3 days ago in Search Console and I wondered if there was a method of requesting Google re-crawl the sitemap, in the same way as you can ask it to re-crawl a page? Explanation - I have a problem with a…
t2pe
  • 218
  • 1
  • 2
  • 5
1
2 3
24 25