The concept of URL canonicalization isn’t as difficult to understand as it is to pronounce. Just think of it as unintentionally having duplicate pages in your website. This guide discusses how canonicalization happens, why you should bother fixing it, and what you should do about it. You can also read some advice by Matt Cutts on this topic.
What is Canonicalization?
Dictionary definitions of canonicalization may vary, but this broad statement may give some clarity:
Canonicalization is the process of converting data that may possibly have more than one representation into a standard/normal/canonical form.
In the SEO game, URL canonicalization refers to a way of identifying the true URL of a certain web page when several choices are available. For search engines, having several versions of a URL presents a huge problem. Which one should they show to users? In order to provide the best user experience possible, search engines won’t show duplicate content so they’ll be forced to select which URL is most likely the original. Check out this link for some best practices.
Are You Having Canonicalization Issues?
In your browser’s address bar, input your own sites address in this format: www.mywebsite.com. Then, hit enter to visit your homepage. Check the address bar. Do you see www.mywebsite.com?
Now, enter mywebsite.com minus the “www” part and hit enter. Again, check the address bar. Does it still display mywebsite.com?
If your answer is yes to both questions, then your website has domain canonicalization issues. Otherwise, if your address bar always displays your URL with the “www” prefix even if you didn’t enter the URL with “www”, then you’re fine. And vice versa.
Now go to one of your internal pages, then click on the link pointing to your homepage. Check your browser’s address bar. Do you notice a page file name affixed to the URL (i.e. www.mywebsite.com/index.html)? If yes, then you have home page canonicalization issues.
Why Bother?
All the URLs below are pointing to one and the same page:
- mywebsite.com
- www.mywebsite.com
- mywebsite.com/index.html
- www.mywebsite.com/index.html
While we see all the web addresses above as referring to the same page, search engines see them as separate addresses. Normally, search engines will choose the one that they think is your primary address, but it may not always be the best for your website.
1. Duplicate content
Search engines see www.mywebsite.com and http://mywebsite.com as different web pages. Since you have more than one URL that serves the same content, they might tag one of those pages as having plagiarized content. The result is lower rankings in search results and a possibility of being banned from the index of search engines.
2. Link popularity
Another problem that may arise from a URL having a www and non-www version is split-link popularity. Let’s say that your homepage receives 300 inbound links. If 100 point to mywebsite.com and 200 point to www.mywebsite.com, then your website won’t get the full benefit of all your 300 incoming links. In fact, your Google PageRank may only reflect the 200 incoming links instead of the 300.
In addition, because your internal links have a bearing in your link popularity, those links pointing to your homepage with a file name at the end (i.e. www.mywebsite.com/index.html) might also “steal” some of the link juice from the true URL.
Which URL Version Should You Choose?
Now the question of which one – the www or the non-www version – should you use. In terms of ranking, search engines don’t have a specific preference. They will simply rank whichever of the two has the most number of quality inbound links. But it has actually been found that users are more likely to link to a URL’s www version since it’s the universally accepted format.
You’re now probably thinking what if there are existing inbound links that point to both the www and non-www version of your homepage. A good is to setup a 301 redirect. Once this is done, if a user requests http://mywebsite.com, your server does a 301 redirect to http://www.mywebsite.com. This will help search engines realize that you prefer the www version to be canonical.
What matters is that the URL version that you choose gets implemented site-wide. Many website owners simply enforce canonicalization using server-side scripting only for their homepage and think they’ve conquered the issue. Unfortunately, it doesn’t end there. If you really want your SEO campaign to be effective, you should make sure that your website follows a strict canonicalization policy. Otherwise, it would only pose a major threat to all of the time, effort, and resources you’ve spent on optimizing your website for better search engine rankings.