Grouping Simple Sitemap by content type

Tags: 

The XML sitemap standard has been around for almost 20 years and serves as a great way of keeping Google and other search engines up to date on what content is changing on a website. Thanks to the world of open source, sites which use Drupal have tools to automatically generate the files based upon the site's content, e.g. Simple Sitemap, instead of having to manually create them.

While the sitemap.xml file is intended for reading by software, and so the readability its output is not a typical concern, recently I was asked by an SEO company to group the links in the sitemap by the type of content, rather than lumping everything into one single file (or paginated group of files). The SEO company wanted to have the site's articles listed in an "articles.xml" file, press releases in a "pressreleases.xml" file, tags in a "tags.xml" file, etc. It turned out to be easy to do in Simple Sitemap, but the terminology was a little confusing at first.

For what it's worth, I tried to do this using the XML Sitemap module, but couldn't work out how, it didn't seem possible to have any control over how the content was grouped.

Here's how I was able to accomplish this goal using Simple Sitemap:

  • Change the "Default" sitemap (on "/admin/config/search/simplesitemap") to the type "Sitemap index". This makes the "sitemap.xml" file contain links to all of the other sitemap files that are defined in the system.
  • Create new sitemaps ("admin/config/search/simplesitemap/variants/add") aka sitemap variants, for each of the new sitemap files that are to be generated, e.g. one for "Articles", one for "Press releases", one for "Tags", etc.
  • On the Inclusion page ("/admin/config/search/simplesitemap/entities") enable the Content entity, click the "Save configuration" button, then when the page reloads click the "Configure" button for the Content entity; go through each content type fieldset, select "Index entities of type User in sitemap Users" and click the "Save configuration" button.
  • Repeat the last step for the Taxonomy Term entity, adding each vocabulary to the appropriate sitemap.
  • On the main sitemaps page ("/admin/config/search/simplesitemap") click the "Rebuild queue and generate" button.
  • Once the process is finished the site's "sitemap.xml" file will contain the site's content and taxonomy terms grouped as requested:

Alternatives

Because the variant which controls the "sitemap.xml" may be controlled from the settings page ("/admin/config/search/simplesitemap/settings"), it is not strictly necessary to change the "Default" sitemap variant to be the "Sitemap index", instead the "Sitemap Index" variant that comes with the module could be enabled and it could be made the default sitemap.

Shorter URLs

The URL of each sitemap is specifically hardcoded to use the path "sitemaps/SITEMAPNAME/sitemap.xml". This is a little verbose and there's no way of shortening it from the Simple Sitemap module itself. One solution to shortening the URLs is to create a path alias for the "sitemaps/SITEMAPNAME/sitemap.xml" URL, e.g. "articles.xml" for "sitemaps/articles/sitemap.xml".

However, while loading "articles.xml" will work, the main sitemap.xml file won't show the aliases, only the original long URLs. This is because a attempt to fix a problem with language prefixes resulted in the URLs always outputting in a sitemap index (i.e. sitemap.xml) using the system paths. A possible fix for this is available, but it's a tricky problem given that fixing one issue breaks something else.

How to reply

Care to add your own 2 cents? Let me know via Twitter or my contact page.