Tag Archive | "Canonical"

Google selects canonical URLs based on your site and user preference

If a different URL is chosen, it doesn’t negatively affect your site.

Please visit Search Engine Land for the full article.

Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Google Search Console Performance Report Now Consolidated To Canonical

In February, Google announced that they would be consolidating the data in the performance report to the canonical. Meaning the AMP, mobile, etc will all be counted towards the main URL’s data in the performance report. This was live in two different views but now the old view is gone.

Search Engine Roundtable

Posted in IM NewsComments Off

Google to retire the info: command, adds canonical information to URL Inspection Tool

Google URL inspection tool now shows the Google discovered canonical URL.

Please visit Search Engine Land for the full article.

Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

SEO Best Practices for Canonical URLs + the Rel=Canonical Tag – Whiteboard Friday

Posted by randfish

If you’ve ever had any questions about the canonical tag, well, have we got the Whiteboard Friday for you. In today’s episode, Rand defines what rel=canonical means and its intended purpose, when it’s recommended you use it, how to use it, and sticky situations to avoid.

SEO best practices for canonical URLs

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week, we’re going to chat about some SEO best practices for canonicalization and use of the rel=canonical tag.

Before we do that, I think it pays to talk about what a canonical URL is, because a canonical URL doesn’t just refer to a page upon which we are targeting or using the rel=canonical tag. Canonicalization has been around, in fact, much longer than the rel=canonical tag itself, which came out in 2009, and there are a bunch of different things that a canonical URL means.

What is a “canonical” URL?

So first off, what we’re trying to say is this URL is the one that we want Google and the other search engines to index and to rank. These other URLs that potentially have similar content or that are serving a similar purpose or perhaps are exact duplicates, but, for some reason, we have additional URLs of them, those ones should all tell the search engines, “No, no, this guy over here is the one you want.”

So, for example, I’ve got a canonical URL, ABC.com/a.

Then I have a duplicate of that for some reason. Maybe it’s a historical artifact or a problem in my site architecture. Maybe I intentionally did it. Maybe I’m doing it for some sort of tracking or testing purposes. But that URL is at ABC.com/b.

Then I have this other version, ABC.com/a?ref=twitter. What’s going on there? Well, that’s a URL parameter. The URL parameter doesn’t change the content. The content is exactly the same as A, but I really don’t want Google to get confused and rank this version, which can happen by the way. You’ll see URLs that are not the original version, that have some weird URL parameter ranking in Google sometimes. Sometimes this version gets more links than this version because they’re shared on Twitter, and so that’s the one everybody picked up and copied and pasted and linked to. That’s all fine and well, so long as we canonicalize it.

Or this one, it’s a print version. It’s ABC.com/aprint.html. So, in all of these cases, what I want to do is I want to tell Google, “Don’t index this one. Index this one. Don’t index this one. Index this one. Don’t index this one. Index this one.”

I can do that using this, the link rel=canonical, the href telling Google, “This is the page.” You put this in the header tag of any document and Google will know, “Aha, this is a copy or a clone or a duplicate of this other one. I should canonicalize all of my ranking signals, and I should make sure that this other version ranks.”

By the way, you can be self-referential. So it is perfectly fine for ABC.com/a to go ahead and use this as well, pointing to itself. That way, in the event that someone you’ve never even met decides to plug in question mark, some weird parameter and point that to you, you’re still telling Google, “Hey, guess what? This is the original version.”

Great. So since I don’t want Google to be confused, I can use this canonicalization process to do it. The rel=canonical tag is a great way to go. By the way, FYI, it can be used cross-domain. So, for example, if I republish the content on A at something like a Medium.com/@RandFish, which is, I think, my Medium account, /a, guess what? I can put in a cross-domain rel=canonical telling them, “This one over here.” Now, even if Google crawls this other website, they are going to know that this is the original version. Pretty darn cool.

Different ways to canonicalize multiple URLs

There are different ways to canonicalize multiple URLs.

1. Rel=canonical.

I mention that rel=canonical isn’t the only one. It’s one of the most strongly recommended, and that’s why I’m putting it at number one. But there are other ways to do it, and sometimes we want to apply some of these other ones. There are also not-recommended ways to do it, and I’m going to discuss those as well.

2. 301 redirect.

The 301 redirect, this is basically a status code telling Google, “Hey, you know what? I’m going to take /b, I’m going to point it to /a. It was a mistake to ever have /b. I don’t want anyone visiting it. I don’t want it clogging up my web analytics with visit data. You know what? Let’s just 301 redirect that old URL over to this new one, over to the right one.”

3. Passive parameters in Google search console.

Some parts of me like this, some parts of me don’t. I think for very complex websites with tons of URL parameters and a ton of URLs, it can be just an incredible pain sometimes to go to your web dev team and say like, “Hey, we got to clean up all these URL parameters. I need you to add the rel=canonical tag to all these different kinds of pages, and here’s what they should point to. Here’s the logic to do it.” They’re like, “Yeah, guess what? SEO is not a priority for us for the next six months, so you’re going to have to deal with it.”

Probably lots of SEOs out there have heard that from their web dev teams. Well, guess what? You can end around it, and this is a fine way to do that in the short term. Log in to your Google search console account that’s connected to your website. Make sure you’re verified. Then you can basically tell Google, through the Search Parameters section, to make certain kinds of parameters passive.

So, for example, you have sessionid=blah, blah, blah. You can set that to be passive. You can set it to be passive on certain kinds of URLs. You can set it to be passive on all types of URLs. That helps tell Google, “Hey, guess what? Whenever you see this URL parameter, just treat it like it doesn’t exist at all.” That can be a helpful way to canonicalize.

4. Use location hashes.

So let’s say that my goal with /b was basically to have exactly the same content as /a but with one slight difference, which was I was going to take a block of content about a subsection of the topic and place that at the top. So A has the section about whiteboard pens at the top, but B puts the section about whiteboard pens toward the bottom, and they put the section about whiteboards themselves up at the top. Well, it’s the same content, same search intent behind it. I’m doing the same thing.

Well, guess what? You can use the hash in the URL. So it’s a#b and that will jump someone — it’s also called a fragment URL — jump someone to that specific section on the page. You can see this, for example, Moz.com/about/jobs. I think if you plug in #listings, it will take you right to the job listings. Instead of reading about what it’s like to work here, you can just get directly to the list of jobs themselves. Now, Google considers that all one URL. So they’re not going to rank them differently. They don’t get indexed differently. They’re essentially canonicalized to the same URL.


I do not recommend…

5. Blocking Google from crawling one URL but not the other version.

Because guess what? Even if you use robots.txt and you block Googlebot’s spider and you send them away and they can’t reach it because you said robots.txt disallow /b, Google will not know that /b and /a have the same content on them. How could they?

They can’t crawl it. So they can’t see anything that’s here. It’s invisible to them. Therefore, they’ll have no idea that any ranking signals, any links that happen to point there, any engagement signals, any content signals, whatever ranking signals that might have helped A rank better, they can’t see them. If you canonicalize in one of these ways, now you’re telling Google, yes, B is the same as A, combine their forces, give me all the rankings ability.

6. I would also not recommend blocking indexation.

So you might say, “Ah, well Rand, I’ll use the meta robots no index tag, so that way Google can crawl it, they can see that the content is the same, but I won’t allow them to index it.” Guess what? Same problem. They can see that the content is the same, but unless Google is smart enough to automatically canonicalize, which I would not trust them on, I would always trust yourself first, you are essentially, again, preventing them from combining the ranking signals of B into A, and that’s something you really want.

7. I would not recommend using the 302, the 307, or any other 30x other than the 301.

This is the guy that you want. It is a permanent redirect. It is the most likely to be most successful in canonicalization, even though Google has said, “We often treat 301s and 302s similarly.” The exception to that rule is but a 301 is probably better for canonicalization. Guess what we’re trying to do? Canonicalize!

8. Don’t 40x the non-canonical version.

So don’t take /b and be like, “Oh, okay, that’s not the version we want anymore. We’ll 404 it.” Don’t 404 it when you could 301. If you send it over here with a 301 or you use the rel=canonical in your header, you take all the signals and you point them to A. You lose them if you 404 that in B. Now, all the signals from B are gone. That’s a sad and terrible thing. You don’t want to do that either.

The only time I might do this is if the page is very new or it was just an error. You don’t think it has any ranking signals, and you’ve got a bunch of other problems. You don’t want to deal with having to maintain the URL and the redirect long term. Fine. But if this was a real URL and real people visited it and real people linked to it, guess what? You need to redirect it because you want to save those signals.

When to canonicalize URLs

Last but not least, when should we canonicalize URLs versus not?

I. If the content is extremely similar or exactly duplicate.

Well, if it is the case that the content is either extremely similar or exactly duplicate on two different URLs, two or more URLs, you should always collapse and canonicalize those to a single one.

II. If the content is serving the same (or nearly the same) searcher intent (even if the KW targets vary somewhat).

If the content is not duplicate, maybe you have two pages that are completely unique about whiteboard pens and whiteboards, but even though the content is unique, meaning the phrasing and the sentence structures are the same, that does not mean that you shouldn’t canonicalize.

For example, this Whiteboard Friday about using the rel=canonical, about canonicalization is going to replace an old version from 2009. We are going to take that old version and we are going to use the rel=canonical. Why are we going to use the rel=canonical? So that you can still access the old one if for some reason you want to see the version that we originally came out with in 2009. But we definitely don’t want people visiting that one, and we want to tell Google, “Hey, the most up-to-date one, the new one, the best one is this new version that you’re watching right now.” I know this is slightly meta, but that is a perfectly reasonable use.

What I’m trying to aim at is searcher intent. So if the content is serving the same or nearly the same searcher intent, even if the keyword targeting is slightly different, you want to canonicalize those multiple versions. Google is going to do a much better job of ranking a single piece of content that has lots of good ranking signals for many, many keywords that are related to it, rather than splitting up your link equity and your other ranking signal equity across many, many pages that all target slightly different variations. Plus, it’s a pain in the butt to come up with all that different content. You would be best served by the very best content in one place.

III. If you’re republishing or refreshing or updating old content.

Like the Whiteboard Friday example I just used, you should use the rel=canonical in most cases. There are some exceptions. If you want to maintain that old version, but you’d like the old version’s ranking signals to come to the new version, you can take the content from the old version, republish that at /a-old. Then take /a and redirect that or publish the new version on there and have that version be the one that is canonical and the old version exist at some URL you’ve just created but that’s /old. So republishing, refreshing, updating old content, generally canonicalization is the way to go, and you can preserve the old version if you want.

IV. If content, a product, an event, etc. is no longer available and there’s a near best match on another URL.

If you have content that is expiring, a piece of content, a product, an event, something like that that’s going away, it’s no longer available and there’s a next best version, the version that you think is most likely to solve the searcher’s problems and that they’re probably looking for anyway, you can canonicalize in that case, usually with a 301 rather than with a rel=canonical, because you don’t want someone visiting the old page where nothing is available. You want both searchers and engines to get redirected to the new version, so good idea to essentially 301 at that point.

Okay, folks. Look forward to your questions about rel=canonicals, canonical URLs, and canonicalization in general in SEO. And we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

Here’s How to Generate and Insert Rel Canonical with Google Tag Manager

Posted by luciamarin

In this article, we’re going to learn how to create the rel canonical URL tag using Google Tag Manager, and how to insert it in every page of our website so that the correct canonical is automatically generated in each URL.

We’ll do it using Google Tag Manager and its variables.

Why send a canonical from each page to itself?

Javier Lorente gave us a very good explanation/reminder at the 2015 SEO Salad event in Zaragoza (Spain). In short, there may be various factors that cause Google to index unexpected variants of a URL, and this is often beyond our control:

  • External pages that display our website but use another URL (e.g., Google’s own cache, other search engines and content aggregators, archive.org, etc.). This way, Google will know which one is the original page at all times.
  • Parameters that are irrelevant to SEO/content such as certain filters and order sequences

By including this “standard” canonical in every URL, we are making it easy for Google to identify the original content.

How do we generate the dynamic value of the canonical URL?

To generate the canonical URL, dynamically we need to force it to always correspond to the “clean” (i.e., absolute, unique, and simplified) URL of each page (taking into account the www, URL query string parameters, anchors, etc.).

Remember that, in summary, the URL variables that can be created in GTM (Google Tag Manager) correspond to the following components:

URL variables in Google Tag Manager

We want to create a unique URL for each page, without queries or anchors. We need a “clean” URL variable, and we can’t use the {{Page URL}} built-in variable, for two reasons:

  1. Although fragment doesn’t form part of the URL by default, query string params does
  2. Potential problems with protocol and hostname, if different options are admitted (e.g., SSL and www)

Therefore, we need to combine Protocol + Host + Path into a single variable.

Now, let’s take a step-by-step look at how to create our {{Page URL Canonical}} variable.

1. Create {{Page Protocol}} to compile the section of the URL according to whether it’s an http:// or https://

page protocol

Note: We’re assuming that the entire website will always function under a single protocol. If that’s not the case, then we should substitute the {{Page Protocol}} variable for plain text in the final variable of Step #4. (This will allow us to force it to always be http/https, without exception.)

2. Create {{Page Hostname Canonical}}

We need a variable in which the hostname is always unique, whether or not it’s entered into the browser with the www. The hostname canonical must always be the same, regardless of whether or not it has the www. We can decide based on which one of the domains is redirected to the other, and then keep the original as the canonical.

How do we create the canonical domain?

  • Option 2.1: Redirect the domain with www. to a domain without www. via 301
    Our canonical URL is WITHOUT www. We need to create Page Hostname, but make sure we always remove the www:
    Page hostname canonical without www
  • Option 2.2: Redirect the domain without www. to a domain with www. via 301
    Our canonical URL is WITH www. We need to create Page Hostname without www (like before), and then insert the www in front using a constant variable:

    Page hostname canonical with www

3. Enable the {{Page Path}} built-in variable

Enabled Built-in variables

Note: Although we have the {{Page Hostname}} built-in variable, for this exercise it’s preferable not to use it, as we’re not 100% sure how it will behave in relation to the www (e.g., in this instance, it’s not configurable, unlike when we create it as a GTM custom variable).

4. Create {{Page URL Canonical}}

Link the three previous variables to form a constant variable:

{{Page Protocol}}://{{Page Hostname Canonical}}{{Page Path}}

Summary/Important notes:

  1. Protocol: returns http / https (without ://), which is why we enter this part by hand
  2. Hostname: we can force removal of the www. or not
  3. Path: included from the slash /. Does not include the query, so it’s perfect. We use the built-in option for Page Path.

Page URL canonical

Now that we have created {{Page URL Canonical}}, we could even populate it into Google Analytics via custom dimensions. You can learn to do that in this Google Analytics custom dimensions guide.

How can we insert the canonical into a page using Tag Manager?

Let’s suppose we’ve already got a canonical URL generated dynamically via GTM: {{Page URL Canonical}}.

Now, we need to look at how to insert it into the page using a GTM tag. We should emphasize that this is NOT the “ideal” solution, as it’s always preferable to insert the tag into the <head> of the source code. But, we have confirming evidence from various sources that it DOES work if it’s inserted via GTM. And, as we all know, in most companies, the ideal doesn’t always coincide with the possible!

If we could insert content directly into the <head> via GTM, it would be sufficient to use the following custom HTML tag:

<link href=”{{Page URL Canonical}}” />

But, we know that this won’t work because the inserted content in HTML tags usually goes at the end of the </body>, meaning Google won’t accept or read a <link rel=”canonical”> tag there.

So then, how do we do it? We can use JavaScript code to generate the tag and insert it into the <head>, as described in this article, but in a form that has been adapted for the canonical tag:

 var c = document.createElement('link');
 c.href = {{Page URL Canonical}};

And then, we can set it to fire on the “All Pages” trigger. Seems almost too easy, doesn’t it?

REL Canonical

How do we check whether our rel canonical is working?

Very simple: Check whether the code is generated correctly on the page.

How do we do that?

By looking at the DevTools Console in Chrome, or by using a browser plugin like like Firebug that returns the code generated on the page in the DOM (document object model). We won’t find it in the source code (Ctrl+U).

Here’s how to do this step-by-step:

  1. Open Chrome
  2. Press F12
  3. Click on the first tab in the console (Elements)
    elements tab
  4. Press Ctrl+F and search for “canonical”
  5. If the URL appears in the correct form at the end of the <head>, that means the tag has been generated correctly via Tag Manager
    tag generated correctly

That’s it. Easy-peasy, right?

So, what are your thoughts?

Do you also use Google Tag Manager to improve your SEO? Why don’t you give us some examples of when it’s been useful (or not)?

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

6 Extreme Canonical Tricks

Posted by Dr. Pete

After I wrote about my intentional experiment in catastrophic canonicalization last year, I started getting a lot of questions about other uses (and abuses) of the canonical tag. In many cases, I couldn’t find much data out in the blogosphere, so I decided to put a few of these questions to the test in a series of mini-experiments. Most of these applications are a bit extreme, and you’d probably never try them on a real site, but I think they all help to test the boundaries of the canonical tag and how Google processes it.

(1) Cross-domain Syndication

Rand recently wrote up his experience with a cross-domain use of the canonical tag, and I had an opportunity to try it on 2 of my own sites. The purpose was legitimate – I wrote a post about celebrating 5 years in business, and it made sense to cross-post on both my company (User Effect) and personal (30GO30) blogs. Since my personal blog is relatively new, and I felt the post was more personal than corporate, I wanted it to get credit for being the source of the article.

Of course, my company blog is quite a bit older and stronger on just about every dimension you can think of. I’ve listed a few metrics below (from the start of the test), for reference:

Domain Stats

So, the obvious question was: could the cross-domain canonical tag override all of the other signals suggesting that my company blog was actually more authoritative?

The short answer is: "Yes". I published the post nearly simultaneously on both blogs on May 10th. The next day, Google started indexing the title of the post from the home-pages (the 2 home pages both appeared in SERPs). On May 12th, the full post was indexed and ranking only on 30GO30 (for the post title). Google seemed to have no issue with the cross-domain canonical from a stronger domain to a weaker domain.

(2) Canonical in <BODY> Tag

One common fear about the cross-domain use of the canonical tag is how it might be hijacked. Obviously, someone can hack your server, but what if you allow user-generated content and someone simply drops a canonical tag in the middle of the page?

To test this, I dropped a canonical tag right before the closing </BODY> tag. I referenced a page on the same domain, assuming Google would be more likely to process the internal canonical than a cross-domain (if this worked, I could move on to Phase 2). The misplaced tag seemed to have no effect – I made the change on May 9th and the page was re-cached on May 14th and May 18th with no impact on the SERPs.

After I launched this experiment, Matt Cutts posted about canonical corner cases and addressed this specific issue:

First off, here’s a thought exercise: should Google trust rel=canonical if we see it in the body of the HTML? The answer is no, because some websites let people edit content or HTML on pages of the site. If Google trusted rel=canonical in the HTML body, we’d see far more attacks where people would drop a rel=canonical on part of a web page to try to hijack it.

Since I was already mid-experiment, I thought I’d let it ride, but it was nice to see the confirmation.

(3) Canonical in False <HEAD>

Just so I don’t get accused of mindlessly sheeple-ing whatever Matt says (which I can almost count on as soon as I link to his blog), I tested a variation of (2). This time, I put the bad canonical tag in a second <HEAD> tag, at the very top of the <BODY>. In other words, my page looked something like this…

   <TITLE>Experiment 3 Page</TITLE>
      <LINK REL="canonical" HREF="http://www.example.com/bad-page">

The change was made on May 18th and the page re-cached on May 20th, May 22nd, May 26th, and June 4th. It had no measurable impact, consistent with Matt’s statements.

(4) Canonical to Fake URL

In parallel with (2), I tested an idea that came out of Q&A. What would happen if you pointed a canonical tag to a URL that doesn’t exist? Obviously, you wouldn’t generally do that on purpose, but if, for example, you made a major error in your CMS, how damaging would it be?

I introduced the canonical tag (on a different page than (2), of course) on May 9th. It re-cached on May 15th, May 17th, May 21st, and June 1st. It had no apparent impact.

It turns out Matt addressed this one in his post, too (thanks for ruining my research, Matt):

For example, if we think you’re shooting yourself in the foot by accident (pointing a rel=canonical toward a non-existent/404 page), we’d reserve the right not to use the destination url you specify with rel=canonical.

The lesson here is pretty simple – the canonical version of the page has to actually exist. While that may seem obvious, I’ve had people ask about using the canonical tag as a sort of URL rewrite. On the surface, the idea has a certain logic, but in practice, it goes completely against the purpose of the canonical tag.

(5) Crossing the Streams

I asked the SEOmoz staff if they had any extreme canonical experiments to try, and Cyrus suggested pointing the canonical tags of 2 pages at each other. I should’ve listened to Egon when he said "don’t cross the streams", but I’m not a very good listener.

So, on May 18th, I pointed the User Effect "About" page at the "Services" page, and vise-versa. Clearly, no one would ever make that mistake, but this was an exercise in exploring how Google would interpret the suggestion – a peek into the black box.

Re-caching took longer than expected, and at first the results looked pretty dull. On May 28th, the "Services" page apparently re-cached, and by June 3rd that page was showing for searches on "About User Effect". It seemed that either the stronger page had won, or the "Services" page had simply been re-crawled first.

Then, something very strange happened. The "About" page reappeared in searches on June 7th, but a query on "User Effect Services" resulted in this:

Strange SERPS

Both pages were now appearing in search results, but the "About" page had its title rewritten to match the service-related query. This was not the title of the actual "Services" page, but a complete invention by Google. Clearly, the mixed signals of the 2 canonical tags created a problem.

I think there’s an important lesson here – if you send Google mixed signals, there are consequences. I see a subtler example of this all the time – people rel-canonical (or even 301-redirect) to one version of a URL, but then use another version in internal links, inbound links, social media, etc. If you say one URL is canonical but act in every way like another URL really is, Google may choose your actions over your words. Don’t mix signals.

(6) Facebook/Twitter Buttons

This last one’s not really an experiment, but something interesting I noticed about social media plug-ins while I was stirring up trouble. To make a long story short, my personal blog focuses on 30-day challenges. I’ll often have a main post about a challenge and then a number of update posts to tell how it’s coming along. Those updates aren’t usually core content that I want to rank, so I decided recently to canonicalize the weekly updates in the challenges to the main challenge post.

A Few days later, I was revisiting one of my weekly update posts and was surprised to see this at the bottom:

Like/Tweet Buttons

I had barely mentioned this particular post on social media, and something was clearly out of whack. I realized quickly that these were the numbers from the canonical version of the page. The Facebook and Twitter scripts were actually honoring the canonical tag.

In the intervening couple of weeks, Facebook no longer seems to be reporting numbers from the canonical version, but the Tweet counts still match the post that the canonical tag points to. I’m not entirely sure what to make of this, but it’s food for thought – canonicalization may be impacting more than just your SEO.

The Usual Disclaimers

I’m not a real doctor – I just play one on TV. Don’t try any of this at home (or at work). Matt Cutts is not the source of all wisdom in the universe, nor is he the antithesis of all wisdom.

Obviously, a couple of these experiments were sillier than others, but I think they all give us some insights into just how seriously Google takes the canonical tag, and how seriously we should probably take it as SEOs. That means using canonicalization to actually point to the canonical versions as we honestly intend them. By playing around the edges of the black box, I’m not trying to crack the Google code, just better understand how we can use these tools effectively and responsibly.

Do you like this post? Yes No

SEOmoz Daily SEO Blog

Posted in IM NewsComments Off