Canonical URLs for SEO – The Good, Bad & Ugly
Canonical URLs are a useful tool in the SEO’s toolbox for resolving the many ways that duplicate content can crop up on your website. The concept is simple enough where a piece of content can be accessed via many URLs or even across multiple domains the canonical provides a single, authoritative URL. This allows all equity to be correctly assigned to the single, canonical URL that is then indexed and returned in search results. Nice and simple. Mostly.
Unfortunately, we see many poorly implemented canonical URLs that rather than fixing duplication issues can actually compound issues. There is often a checklist approach to SEO and canonical URLs end up on that checklist – some bright spark reads that canonicals are good for SEO and a developer puts them in place without really understanding the purpose. Canonical URLs – check.
Ugly canonical implementation is where the implementation actually worsens the very problem we are trying to resolve and in some cases can actually see your entire site removed from the index.
1. Canonical Loop
Consider your site is www.example.com. The site can be accessed on two possible URL variations:
The canonical URL should be one or the other but should be consistent, site-wide across both variations. What we have seen though each version canonicalizes to the other. This creates a conflict and can see your rankings just tank.
2. Canonical + Redirect Loop
We have seen a similar version of the above that combines canonical URLs and HTTP redirections. Consider again the same two URL variations:
Here the http://example.com variation is canonicalized to the http://www.example.com domain.
The http://www.example.com has a canonical to itself. So far so good. Unfortunately, though the http://www.example.co.uk version redirects to the non-canonical variation again creating a loop of sorts. We have seen this drive Google a little crazy and again rankings go to hell.
We have seen a whole bunch of issues that can result in either suboptimal indexation or simply a failure to address the very issues that the canonical exists to address.
1. No canonical
Not a canonical issue as such but everything from URL variables to domain variations to protocol variations to content management systems can create different URLs for a given piece of content. This can see links scattered across different variations and indexation problems. Putting a canonical in place helps to ensure all equity is correctly assigned to one single variation.
2. Not matching the sitemap URLs
You will also likely have submitted an XML sitemap via webmaster tools or your robots.txt file and your canonical URLs should match these URLs or you are sending out some mixed signals.
3. Not matching preferred domain in webmaster tools
You will have also likely set a preferred domain in webmaster tools and your canonical URLs should match this domain else you are sending out more of those mixed signals that can confuse
4. Not matching internal navigation
This is not the end of the world and a well implemented canonical will resolve this but if you are going to have a canonical then be consistent with your internal navigation and everything else.
5. Multiple canonical URLs on each variation
This is a fairly common goof we see on custom built sites which come from a lack of understanding exactly what a canonical URL should achieve. Consider a site that is available on four URL variations:
This is potentially four different sites in the eyes of Google and as such we should have one consistent canonical URL across the four sites – let’s say https://www.example.com.
What we often see though is each site sets itself as the canonical. So we have four variations and four canonical URLs. Here, we take a tool that is used to resolve multiple URLs and it ends up doing exactly the opposite and stating that each should be considered on its own merits (despite being a URL-based duplicate). Throw in a bunch of URL variables as well and this situation just gets exponentially worse.
This tends to result in a mishmash of indexed pages across the variations and a generally weaker than it should be in performance in organic search.
Knowing the common issues what does a good canonical implementation look like? Well, this is pretty simple:
- all domain variations have a single canonical (or better still a 301 redirect)
- all page variations have a single canonical page / domain
- A single protocol is specified in the canonical (HTTP / HTTPS)
- The canonical URLs match the URLs in your site map
- The canonical URL matches the preferred domain in webmaster tools
- The canonical URLs match the internal navigation
As an example if we had the four following URLs they would all specify a single canonical so:
http://example.com -> canonical = https://www.example.com
http://www.example.com -> canonical = https://www.example.com
https://example.com -> canonical = https://www.example.com
https://www.example.com -> canonical = https://www.example.com
The best way to test this is to check the source code of your pages for all your domain variations. Check and ensure whichever variation you look at the single correct protocol and domain is specified.
You can then use the google site command to look for protocol/domain/page variations in the results:
You can even take that one step further and search using the root domain but excluding the one single subdomain that should be indexed. Assuming www.example.com is what we should have indexed the following will return only other variations:
You can also do content spot checks by Google specific and unique blocks of content in double quotes – if we see more than one page returned then we need to check the canonicalization.
And of course, beyond that, a crawl in Screaming Frog will provide details on what has been canonicalised and the ability to review canonicals (and is likely where I would start).
One URL to rule them all
The canonical URL is a simple tool to help deal with the many instances that a piece of content could exist on multiple URLs. Well implemented it will save you many a headache yet poorly implemented or mixed in with redirections it can cause issues from a weak performance to a complete lack of visibility.
Drop a comment below if we can help or hit us up on one of our social profiles where one of the Bowler Hat team will always be happy to help feedback if you are in canonical Hell! Oh, and please share the article if you found it useful. 🙂