Sitecore XML Sitemap Module Gotcha | Perficient Digital

Sitecore XML Sitemap Module Gotcha

When creating any modern website, sitemaps are an important consideration. A sitemap is a xml file that tells search engines and other crawlers all about your site’s content and structure. According to Google’s documentation, “Google doesn’t guarantee that we’ll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site’s structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission”. In short, sitemaps are a good idea for any size website.

What does a sitemap look like? The schema is defined on sitemaps.org. An example file is embedded below.

For Sitecore, there is a fantastic module for serving sitemaps called Sitemap XML. Sitemap XML “creates a sitemap that is compliant with the schema defined on sitemaps.org for your site.”. Like most modules in the Sitecore Marketplace, installation is accomplished via a package and some configuration edits. Follow the rest of the setup instructions on the module page – which will only take a few minutes for most sites. Pay special attention to which templates to include as you do not want crawlers trying to index data items that have no presentation specified.

Sounds great. So what is the gotcha? In the configuration file that comes with the module there are some SEO related settings:

Notice that by default, the encodeNameReplacement node in the config replaces all spaces in URLs to dashes (great for SEO) The module then turns the dashes back to spaces when retrieving items from Sitecore. The problem reared its ugly head when Sitecore media was being served via -/media instead of the default ~/media. In this scenario, when media was being retrieved from Sitecore, the Sitemap module was replacing the dash with a space and no media items were able to be retrieved! On top of that, the module would also break any item (media or otherwise) that contained a dash. My recommendation would be to disable the dash replacement behavior in the module and enforce page item naming rules at the time of content creation. Remember that spaces are for people and dashes are for browsers. Lesson learned: examine configuration and documentation closely when installing any sort of third party component into your web application – Sitecore modules, nuget packages or other open source libraries.

Leave a Reply