Get energizing workout moves, healthy recipes, and advice on losing weight and feeling great from Health.com. Find out how to manage diabetes and depression, prevent heart attacks, and more. © 2021 Cable News Network. A Warner Media Company. All Rights Reserved. CNN Sans ™ & © 2016 Cable News Network. We would like to show you a description here but the site won’t allow us. Find the latest tips, advice, news stories and videos from the TODAY Show on NBC.
Jump to:
XML tag definitions
Entity escaping
Using Sitemap index files
Sitemap file location
Validating your Sitemap
Extending the Sitemaps protocol
Informing search engine crawlers
This document describes the XML schema for the Sitemap protocol.
The Sitemap protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.
The Sitemap must:
- Begin with an opening
<urlset>
tag and end with a closing</urlset>
tag. - Specify the namespace (protocol standard) within the
<urlset>
tag. - Include a
<url>
entry for each URL, as a parent XML tag. - Include a
<loc>
child entry for each<url>
parent tag.
All other tags are optional. Support for these optional tags may vary among search engines. Refer to each search engine's documentation for details.
Sample XML Sitemap
The following example shows a Sitemap that contains just one URL and uses all optional tags. The optional tags are in italics.
Also see our example with multiple URLs.
XML tag definitions
The available XML tags are described below.
Attribute | Description | |
---|---|---|
<urlset> | required | Encapsulates the file and references the current protocol standard. |
<url> | required | Parent tag for each URL entry. The remaining tags are children of this tag. |
<loc> | required | URL of the page. This URL must begin with the protocol (such as http) and end with a trailing slash, if your web server requires it. This value must be less than 2,048 characters. |
<lastmod> | optional | The date of last modification of the file. This date should be in W3C Datetime format. This format allows you to omit the time portion, if desired, and use YYYY-MM-DD. Note that this tag is separate from the If-Modified-Since (304) header the server can return, and search engines may use the information from both sources differently. |
<changefreq> | optional | How frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values are:
The value 'always' should be used to describe documents that change each time they are accessed. The value 'never' should be used to describe archived URLs. Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked 'hourly' less frequently than that, and they may crawl pages marked 'yearly' more frequently than that. Crawlers may periodically crawl pages marked 'never' so that they can handle unexpected changes to those pages. |
<priority> | optional | The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value does not affect how your pages are compared to pages on other sites—it only lets the search engines know which pages you deem most important for the crawlers. The default priority of a page is 0.5. Please note that the priority you assign to a page is not likely to influence the position of your URLs in a search engine's result pages. Search engines may use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your most important pages are present in a search index. Also, please note that assigning a high priority to all of the URLs on your site is not likely to help you. Since the priority is relative, it is only used to select between URLs on your site. |
Entity escaping
Your Sitemap file must be UTF-8 encoded (you can generally do this when you save the file). As with all XML files, any data values (including URLs) must use entity escape codes for the characters listed in the table below.
Character | Escape Code | |
---|---|---|
Ampersand | & | & |
Single Quote | ' | ' |
Double Quote | ' | " |
Greater Than | > | > |
Less Than | < | < |
In addition, all URLs (including the URL of your Sitemap) must be URL-escaped and encoded for readability by the web server on which they are located. However, if you are using any sort of script, tool, or log file to generate your URLs (anything except typing them in by hand), this is usually already done for you. Please check to make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs, and the XML standard.
Below is an example of a URL that uses a non-ASCII character (ü
), as well as a character that requires entity escaping (&
):
Below is that same URL, ISO-8859-1 encoded (for hosting on a server that uses that encoding) and URL escaped:
Below is that same URL, UTF-8 encoded (for hosting on a server that uses that encoding) and URL escaped:
Below is that same URL, but also entity escaped:
Sample XML Sitemap
The following example shows a Sitemap in XML format. The Sitemap in the example contains a small number of URLs, each using a different set of optional parameters.
Using Sitemap index files (to group multiple sitemap files)
Sitemap Generator
You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). If you would like, you may compress your Sitemap files using gzip to stay within 10MB and reduce your bandwidth requirement. If you want to list more than 50,000 URLs, you must create multiple Sitemap files.
Sitemap For Blogger
If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file. Sitemap index files may not list more than 1,000 Sitemaps and must be no larger than 10MB (10,485,760 bytes). The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file.
The Sitemap index file must:
- Begin with an opening
<sitemapindex>
tag and end with a closing</sitemapindex>
tag. - Include a
<sitemap>
entry for each Sitemap as a parent XML tag. - Include a
<loc>
child entry for each<sitemap>
parent tag.
The optional <lastmod>
tag is also available for Sitemap index files.
Note: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com. As with Sitemaps, your Sitemap index file must be UTF-8 encoded.
Sample XML Sitemap Index
The following example shows a Sitemap index that lists two Sitemaps:
Note: Sitemap URLs, like all values in your XML files, must be entity escaped.
Sitemap Index XML Tag Definitions
Attribute | Description | |
---|---|---|
<sitemapindex> | required | Encapsulates information about all of the Sitemaps in the file. |
<sitemap> | required | Encapsulates information about an individual Sitemap. |
<loc> | required | Identifies the location of the Sitemap. This location can be a Sitemap, an Atom file, RSS file or a simple text file. |
<lastmod> | optional | Identifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in W3C Datetime format. By providing the last modification timestamp, you enable search engine crawlers to retrieve only a subset of the Sitemaps in the index i.e. a crawler may only retrieve Sitemaps that were modified since a certain date. This incremental Sitemap fetching mechanism allows for the rapid discovery of new URLs on very large sites. |
Sitemap file location
The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.
If you have the permission to change http://example.org/path/sitemap.xml, it is assumed that you also have permission to provide information for URLs with the prefix http://example.org/path/. Examples of URLs considered valid in http://example.com/catalog/sitemap.xml include:
URLs not considered valid in http://example.com/catalog/sitemap.xml include:
Note that this means that all URLs listed in the Sitemap must use the same protocol (http, in this example) and reside on the same host as the Sitemap. For instance, if the Sitemap is located at http://www.example.com/sitemap.xml, it can't include URLs from http://subdomain.example.com.
URLs that are not considered valid are dropped from further consideration. It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your web server is at example.com, then your Sitemap index file would be at http://example.com/sitemap.xml. In certain cases, you may need to produce different Sitemaps for different paths (e.g., if security permissions in your organization compartmentalize write access to different directories).
If you submit a Sitemap using a path with a port number, you must include that port number as part of the path in each URL listed in the Sitemap file. For instance, if your Sitemap is located at http://www.example.com:100/sitemap.xml, then each URL listed in the Sitemap must begin with http://www.example.com:100.
Validating your Sitemap
The following XML schemas define the elements and attributes that can appear in your Sitemap file. You can download this schema from the links below:
For Sitemaps:http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
For Sitemap index files:http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd
There are a number of tools available to help you validate the structure of your Sitemap based on this schema. You can find a list of XML-related tools at each of the following locations:
http://www.w3.org/XML/Schema#Tools
http://www.xml.com/pub/a/2000/12/13/schematools.html
In order to validate your Sitemap or Sitemap index file against a schema, the XML file will need additional headers as shown below.
Sitemap:
Sitemap index file:
Sitemap Google
Extending the Sitemaps protocol
You can extend the Sitemaps protocol using your own namespace. Simply specify this namespace in the root element. For example:
Informing search engine crawlers
Sitemap Creator
Once you have created the Sitemap file and placed it on your webserver, you need to inform the search engines that support this protocol of its location by submitting it to them via the search engine's submission interface or an HTTP request.
The search engines can then retrieve your Sitemap and make the URLs available to their crawlers.
Last Updated: 16 November 2006