Tag Archive | "URLS"

How to Automate Pagespeed Insights For Multiple URLs using Google Sheets

Posted by James_McNulty

Calculating individual page speed performance metrics can help you to understand how efficiently your site is running as a whole. Since Google uses the speed of a site (frequently measured by and referred to as PageSpeed) as one of the signals used by its algorithm to rank pages, it’s important to have that insight down to the page level.

One of the pain points in website performance optimization, however, is the lack of ability to easily run page speed performance evaluations en masse. There are plenty of great tools like PageSpeed Insights or the Lighthouse Chrome plugin that can help you understand more about the performance of an individual page, but these tools are not readily configured to help you gather insights for multiple URLs — and running individual reports for hundreds or even thousands of pages isn’t exactly feasible or efficient.

In September 2018, I set out to find a way to gather sitewide performance metrics and ended up with a working solution. While this method resolved my initial problem, the setup process is rather complex and requires that you have access to a server.

Ultimately, it just wasn’t an efficient method. Furthermore, it was nearly impossible to easily share with others (especially those outside of UpBuild).

In November 2018, two months after I published this method, Google released version 5 of the PageSpeed Insights API. V5 now uses Lighthouse as its analysis engine and also incorporates field data provided by the Chrome User Experience Report (CrUX). In short, this version of the API now easily provides all of the data that is provided in the Chrome Lighthouse audits.

So I went back to the drawing board, and I’m happy to announce that there is now an easier, automated method to produce Lighthouse reports en masse using Google Sheets and Pagespeed Insights API v5.

Introducing the Automated PageSpeed Insights Report:

With this tool, we are able to quickly uncover key performance metrics for multiple URLs with just a couple of clicks.

All you’ll need is a copy of this Google Sheet, a free Google API key, and a list of URLs you want data for — but first, let’s take a quick tour.

How to use this tool

The Google Sheet consists of the three following tabs:

  • Settings
  • Results
  • Log

Settings

On this tab, you will be required to provide a unique Google API key in order to make the sheet work.

Getting a Google API Key

  1. Visit the Google API Credentials page.
  2. Choose the API key option from the ‘Create credentials’ dropdown (as shown):

  1. You should now see a prompt providing you with a unique API key:

  1. Next, simply copy and paste that API key into the section shown below found on the “Settings” tab of the Automated Pagespeed Insights spreadsheet.

Now that you have an API key, you are ready to use the tool.

Setting the report schedule

On the Settings tab, you can schedule which day and time that the report should start running each week. As you can see from this screenshot below, we have set this report to begin every Wednesday at 8:00 am. This will be set to the local time as defined by your Google account.

As you can see this setting is also assigning the report to run for the following three hours on the same day. This is a workaround to the limitations set by both Google Apps Scripts and Google PageSpeed API.

Limitations

Our Google Sheet is using a Google Apps script to run all the magic behind the scenes. Each time that the report runs, Google Apps Scripts sets a six-minute execution time limit, (thirty minutes for G Suite Business / Enterprise / Education and Early Access users).

In six minutes you should be able to extract PageSpeed Insights for around 30 URLs.

Then you’ll be met with the following message:

In order to continue running the function for the rest of the URLs, we simply need to schedule the report to run again. That is why this setting will run the report again three more times in the consecutive hours, picking up exactly where it left off.

The next hurdle is the limitation set by Google Sheets itself.

If you’re doing the math, you’ll see that since we can only automate the report a total of four times — we theoretically will be only able to pull PageSpeed Insights data for around 120 URLs. That’s not ideal if you’re working with a site that has more than a few hundred pages!.

The schedule function in the Settings tab uses the Google Sheet’s built-in Triggers feature. This tells our Google Apps script to run the report automatically at a particular day and time. Unfortunately, using this feature more than four times causes the “Service using too much computer time for one day” message.

This means that our Google Apps Script has exceeded the total allowable execution time for one day. It most commonly occurs for scripts that run on a trigger, which have a lower daily limit than scripts executed manually.

Manually?

You betcha! If you have more than 120 URLs that you want to pull data for, then you can simply use the Manual Push Report button. It does exactly what you think.

Manual Push Report

Once clicked, the ‘Manual Push Report’ button (linked from the PageSpeed Menu on the Google Sheet) will run the report. It will pick up right where it left off with data populating in the fields adjacent to your URLs in the Results tab.

For clarity, you don’t even need to schedule the report to run to use this document. Once you have your API key, all you need to do is add your URLs to the Results tab (starting in cell B6) and click ‘Manual Push Report’.

You will, of course, be met with the inevitable “Exceed maximum execution time” message after six minutes, but you can simply dismiss it, and click “Manual Push Report” again and again until you’re finished. It’s not fully automated, but it should allow you to gather the data you need relatively quickly.

Setting the log schedule

Another feature in the Settings tab is the Log Results function.

This will automatically take the data that has populated in the Results tab and move it to the Log sheet. Once it has copied over the results, it will automatically clear the populated data from the Results tab so that when the next scheduled report run time arrives, it can gather new data accordingly. Ideally, you would want to set the Log day and time after the scheduled report has run to ensure that it has time to capture and log all of the data.

You can also manually push data to the Log sheet using the ‘Manual Push Log’ button in the menu.

How to confirm and adjust the report and log schedules

Once you’re happy with the scheduling for the report and the log, be sure to set it using the ‘Set Report and Log Schedule’ from the PageSpeed Menu (as shown):

Should you want to change the frequency, I’d recommend first setting the report and log schedule using the sheet.

Then adjust the runLog and runTool functions using Google Script Triggers.

  • runLog controls when the data will be sent to the LOG sheet.
  • runTool controls when the API runs for each URL.

Simply click the pencil icon next to each respective function and adjust the timings as you see fit.

You can also use the ‘Reset Schedule’ button in the PageSpeed Menu (next to Help) to clear all scheduled triggers. This can be a helpful shortcut if you’re simply using the interface on the ‘Settings’ tab.

PageSpeed results tab

This tab is where the PageSpeed Insights data will be generated for each URL you provide. All you need to do is add a list of URLs starting from cell B6. You can either wait for your scheduled report time to arrive or use the ‘Manual Push Report’ button.

You should now see the following data generating for each respective URL:

  • Time to Interactive
  • First Contentful Paint
  • First Meaningful Paint
  • Time to First Byte
  • Speed Index

You will also see a column for Last Time Report Ran and Status on this tab. This will tell you when the data was gathered, and if the pull request was successful. A successful API request will show a status of “complete” in the Status column.

Log tab

Logging the data is a useful way to keep a historical account on these important speed metrics. There is nothing to modify in this tab, however, you will want to ensure that there are plenty of empty rows. When the runLog function runs (which is controlled by the Log schedule you assign in the “Settings” tab, or via the Manual Push Log button in the menu), it will move all of the rows from the Results tab that contains a status of “complete”. If there are no empty rows available on the Log tab, it will simply not copy over any of the data. All you need to do is add several thousands of rows depending on how often you plan to check-in and maintain the Log.

How to use the log data

The scheduling feature in this tool has been designed to run on a weekly basis to allow you enough time to review the results, optimize, then monitor your efforts. If you love spreadsheets then you can stop right here, but if you’re more of a visual person, then read on.

Visualizing the results in Google Data Studio

You can also use this Log sheet as a Data Source in Google Data Studio to visualize your results. As long as the Log sheet stays connected as a source, the results should automatically publish each week. This will allow you to work on performance optimization and evaluate results using Data Studio easily, as well as communicate performance issues and progress to clients who might not love spreadsheets as much as you do.

Blend your log data with other data sources

One great Google Data Studio feature is the ability to blend data. This allows you to compare and analyze data from multiple sources, as long as they have a common key. For example, if you wanted to blend the Time to Interactive results against Google Search Console data for those same URLs, you can easily do so. You will notice that the column in the Log tab containing the URLs is titled “Landing Page”. This is the same naming convention that Search Console uses and will allow Data Studio to connect the two sources.

There are several ways that you can use this data in Google Data Studio.

Compare your competitors’ performance

You don’t need to limit yourself to just your own URLs in this tool; you can use any set of URLs. This would be a great way to compare your competitor’s pages and even see if there are any clear indicators of speed affecting positions in Search results.

Improve usability

Don’t immediately assume that your content is the problem. Your visitors may not be leaving the page because they don’t find the content useful; it could be slow load times or other incompatibility issues that are driving visitors away. Compare bounce rates, time on site, and device type data alongside performance metrics to see if it could be a factor.

Increase organic visibility

Compare your performance data against Search ranking positions for your target keywords. Use a tool to gather your page positions, and fix performance issues for landing pages on page two or three of Google Search results to see if you can increase their prominence.

Final thoughts

This tool is all yours.

Make a copy and use it as is, or tear apart the Google Apps Script that makes this thing work and adapt it into something bigger and better (if you do, please let me know; I want to hear all about it).

Remember PageSpeed Insights API V5 now includes all of the same data that is provided in the Chrome Lighthouse audits, which means there are way more available details you can extract beyond the five metrics that this tool generates.

Hopefully, for now, this tool helps you gather Performance data a little more efficiently between now and when Google releases their recently announced Speed report for Search Console.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Google selects canonical URLs based on your site and user preference

If a different URL is chosen, it doesn’t negatively affect your site.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Search Buzz Video Recap: Google Core Update News, Icons In Search Bar, Bing Bulk URLs, Google Conferences & My New Vlog

This week I am testing out a new camera, not sure if I will use it for this format but I will for the new vlog. Google finished rolling out the June 2019 core update on June 8th…


Search Engine Roundtable

Posted in IM NewsComments Off

Still have active destination URLs in Bing Ads? It’s time to migrate to final URLs

Say goodbye to standard text ads by the end of the year.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Bing Ads recommends updating final ad URLs to HTTPS if supported

With the Chrome update this month, now is a good time to change your final ad URLs if your site supports the HTTPS protocol.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Google Doesn’t Decode MD5 Hash URLs For Keywords

I love seeing Google SEO related questions I’ve never seen before and this one, I never saw before. Does Google decode md5 hash values in URLs to extract hidden keywords within them for image search or web search. The answer is no but I am sure that is obvious to most of you…


Search Engine Roundtable

Posted in IM NewsComments Off

SEO Best Practices for Canonical URLs + the Rel=Canonical Tag – Whiteboard Friday

Posted by randfish

If you’ve ever had any questions about the canonical tag, well, have we got the Whiteboard Friday for you. In today’s episode, Rand defines what rel=canonical means and its intended purpose, when it’s recommended you use it, how to use it, and sticky situations to avoid.

SEO best practices for canonical URLs

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week, we’re going to chat about some SEO best practices for canonicalization and use of the rel=canonical tag.

Before we do that, I think it pays to talk about what a canonical URL is, because a canonical URL doesn’t just refer to a page upon which we are targeting or using the rel=canonical tag. Canonicalization has been around, in fact, much longer than the rel=canonical tag itself, which came out in 2009, and there are a bunch of different things that a canonical URL means.

What is a “canonical” URL?

So first off, what we’re trying to say is this URL is the one that we want Google and the other search engines to index and to rank. These other URLs that potentially have similar content or that are serving a similar purpose or perhaps are exact duplicates, but, for some reason, we have additional URLs of them, those ones should all tell the search engines, “No, no, this guy over here is the one you want.”

So, for example, I’ve got a canonical URL, ABC.com/a.

Then I have a duplicate of that for some reason. Maybe it’s a historical artifact or a problem in my site architecture. Maybe I intentionally did it. Maybe I’m doing it for some sort of tracking or testing purposes. But that URL is at ABC.com/b.

Then I have this other version, ABC.com/a?ref=twitter. What’s going on there? Well, that’s a URL parameter. The URL parameter doesn’t change the content. The content is exactly the same as A, but I really don’t want Google to get confused and rank this version, which can happen by the way. You’ll see URLs that are not the original version, that have some weird URL parameter ranking in Google sometimes. Sometimes this version gets more links than this version because they’re shared on Twitter, and so that’s the one everybody picked up and copied and pasted and linked to. That’s all fine and well, so long as we canonicalize it.

Or this one, it’s a print version. It’s ABC.com/aprint.html. So, in all of these cases, what I want to do is I want to tell Google, “Don’t index this one. Index this one. Don’t index this one. Index this one. Don’t index this one. Index this one.”

I can do that using this, the link rel=canonical, the href telling Google, “This is the page.” You put this in the header tag of any document and Google will know, “Aha, this is a copy or a clone or a duplicate of this other one. I should canonicalize all of my ranking signals, and I should make sure that this other version ranks.”

By the way, you can be self-referential. So it is perfectly fine for ABC.com/a to go ahead and use this as well, pointing to itself. That way, in the event that someone you’ve never even met decides to plug in question mark, some weird parameter and point that to you, you’re still telling Google, “Hey, guess what? This is the original version.”

Great. So since I don’t want Google to be confused, I can use this canonicalization process to do it. The rel=canonical tag is a great way to go. By the way, FYI, it can be used cross-domain. So, for example, if I republish the content on A at something like a Medium.com/@RandFish, which is, I think, my Medium account, /a, guess what? I can put in a cross-domain rel=canonical telling them, “This one over here.” Now, even if Google crawls this other website, they are going to know that this is the original version. Pretty darn cool.

Different ways to canonicalize multiple URLs

There are different ways to canonicalize multiple URLs.

1. Rel=canonical.

I mention that rel=canonical isn’t the only one. It’s one of the most strongly recommended, and that’s why I’m putting it at number one. But there are other ways to do it, and sometimes we want to apply some of these other ones. There are also not-recommended ways to do it, and I’m going to discuss those as well.

2. 301 redirect.

The 301 redirect, this is basically a status code telling Google, “Hey, you know what? I’m going to take /b, I’m going to point it to /a. It was a mistake to ever have /b. I don’t want anyone visiting it. I don’t want it clogging up my web analytics with visit data. You know what? Let’s just 301 redirect that old URL over to this new one, over to the right one.”

3. Passive parameters in Google search console.

Some parts of me like this, some parts of me don’t. I think for very complex websites with tons of URL parameters and a ton of URLs, it can be just an incredible pain sometimes to go to your web dev team and say like, “Hey, we got to clean up all these URL parameters. I need you to add the rel=canonical tag to all these different kinds of pages, and here’s what they should point to. Here’s the logic to do it.” They’re like, “Yeah, guess what? SEO is not a priority for us for the next six months, so you’re going to have to deal with it.”

Probably lots of SEOs out there have heard that from their web dev teams. Well, guess what? You can end around it, and this is a fine way to do that in the short term. Log in to your Google search console account that’s connected to your website. Make sure you’re verified. Then you can basically tell Google, through the Search Parameters section, to make certain kinds of parameters passive.

So, for example, you have sessionid=blah, blah, blah. You can set that to be passive. You can set it to be passive on certain kinds of URLs. You can set it to be passive on all types of URLs. That helps tell Google, “Hey, guess what? Whenever you see this URL parameter, just treat it like it doesn’t exist at all.” That can be a helpful way to canonicalize.

4. Use location hashes.

So let’s say that my goal with /b was basically to have exactly the same content as /a but with one slight difference, which was I was going to take a block of content about a subsection of the topic and place that at the top. So A has the section about whiteboard pens at the top, but B puts the section about whiteboard pens toward the bottom, and they put the section about whiteboards themselves up at the top. Well, it’s the same content, same search intent behind it. I’m doing the same thing.

Well, guess what? You can use the hash in the URL. So it’s a#b and that will jump someone — it’s also called a fragment URL — jump someone to that specific section on the page. You can see this, for example, Moz.com/about/jobs. I think if you plug in #listings, it will take you right to the job listings. Instead of reading about what it’s like to work here, you can just get directly to the list of jobs themselves. Now, Google considers that all one URL. So they’re not going to rank them differently. They don’t get indexed differently. They’re essentially canonicalized to the same URL.

NOT RECOMMENDED

I do not recommend…

5. Blocking Google from crawling one URL but not the other version.

Because guess what? Even if you use robots.txt and you block Googlebot’s spider and you send them away and they can’t reach it because you said robots.txt disallow /b, Google will not know that /b and /a have the same content on them. How could they?

They can’t crawl it. So they can’t see anything that’s here. It’s invisible to them. Therefore, they’ll have no idea that any ranking signals, any links that happen to point there, any engagement signals, any content signals, whatever ranking signals that might have helped A rank better, they can’t see them. If you canonicalize in one of these ways, now you’re telling Google, yes, B is the same as A, combine their forces, give me all the rankings ability.

6. I would also not recommend blocking indexation.

So you might say, “Ah, well Rand, I’ll use the meta robots no index tag, so that way Google can crawl it, they can see that the content is the same, but I won’t allow them to index it.” Guess what? Same problem. They can see that the content is the same, but unless Google is smart enough to automatically canonicalize, which I would not trust them on, I would always trust yourself first, you are essentially, again, preventing them from combining the ranking signals of B into A, and that’s something you really want.

7. I would not recommend using the 302, the 307, or any other 30x other than the 301.

This is the guy that you want. It is a permanent redirect. It is the most likely to be most successful in canonicalization, even though Google has said, “We often treat 301s and 302s similarly.” The exception to that rule is but a 301 is probably better for canonicalization. Guess what we’re trying to do? Canonicalize!

8. Don’t 40x the non-canonical version.

So don’t take /b and be like, “Oh, okay, that’s not the version we want anymore. We’ll 404 it.” Don’t 404 it when you could 301. If you send it over here with a 301 or you use the rel=canonical in your header, you take all the signals and you point them to A. You lose them if you 404 that in B. Now, all the signals from B are gone. That’s a sad and terrible thing. You don’t want to do that either.

The only time I might do this is if the page is very new or it was just an error. You don’t think it has any ranking signals, and you’ve got a bunch of other problems. You don’t want to deal with having to maintain the URL and the redirect long term. Fine. But if this was a real URL and real people visited it and real people linked to it, guess what? You need to redirect it because you want to save those signals.

When to canonicalize URLs

Last but not least, when should we canonicalize URLs versus not?

I. If the content is extremely similar or exactly duplicate.

Well, if it is the case that the content is either extremely similar or exactly duplicate on two different URLs, two or more URLs, you should always collapse and canonicalize those to a single one.

II. If the content is serving the same (or nearly the same) searcher intent (even if the KW targets vary somewhat).

If the content is not duplicate, maybe you have two pages that are completely unique about whiteboard pens and whiteboards, but even though the content is unique, meaning the phrasing and the sentence structures are the same, that does not mean that you shouldn’t canonicalize.

For example, this Whiteboard Friday about using the rel=canonical, about canonicalization is going to replace an old version from 2009. We are going to take that old version and we are going to use the rel=canonical. Why are we going to use the rel=canonical? So that you can still access the old one if for some reason you want to see the version that we originally came out with in 2009. But we definitely don’t want people visiting that one, and we want to tell Google, “Hey, the most up-to-date one, the new one, the best one is this new version that you’re watching right now.” I know this is slightly meta, but that is a perfectly reasonable use.

What I’m trying to aim at is searcher intent. So if the content is serving the same or nearly the same searcher intent, even if the keyword targeting is slightly different, you want to canonicalize those multiple versions. Google is going to do a much better job of ranking a single piece of content that has lots of good ranking signals for many, many keywords that are related to it, rather than splitting up your link equity and your other ranking signal equity across many, many pages that all target slightly different variations. Plus, it’s a pain in the butt to come up with all that different content. You would be best served by the very best content in one place.

III. If you’re republishing or refreshing or updating old content.

Like the Whiteboard Friday example I just used, you should use the rel=canonical in most cases. There are some exceptions. If you want to maintain that old version, but you’d like the old version’s ranking signals to come to the new version, you can take the content from the old version, republish that at /a-old. Then take /a and redirect that or publish the new version on there and have that version be the one that is canonical and the old version exist at some URL you’ve just created but that’s /old. So republishing, refreshing, updating old content, generally canonicalization is the way to go, and you can preserve the old version if you want.

IV. If content, a product, an event, etc. is no longer available and there’s a near best match on another URL.

If you have content that is expiring, a piece of content, a product, an event, something like that that’s going away, it’s no longer available and there’s a next best version, the version that you think is most likely to solve the searcher’s problems and that they’re probably looking for anyway, you can canonicalize in that case, usually with a 301 rather than with a rel=canonical, because you don’t want someone visiting the old page where nothing is available. You want both searchers and engines to get redirected to the new version, so good idea to essentially 301 at that point.

Okay, folks. Look forward to your questions about rel=canonicals, canonical URLs, and canonicalization in general in SEO. And we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Structuring URLs for Easy Data Gathering and Maximum Efficiency

Posted by Dom-Woodman

Imagine you work for an e-commerce company.

Wouldn’t it be useful to know the total organic sessions and conversions to all of your products? Every week?

If you have access to some analytics for an e-commerce company, try and generate that report now. Give it 5 minutes.

Done?

Or did that quick question turn out to be deceptively complicated? Did you fall into a rabbit hole of scraping and estimations?

Not being able to easily answer that question — and others like it — is costing you thousands every year.

Let’s jump back a step

Every online business, whether it’s a property portal or an e-commerce store, will likely have spent hours and hours agonizing over decisions about how their website should look, feel, and be constructed.

The biggest decision is usually this: What will we build our website with? And from there, there are hundreds of decisions, all the way down to what categories should we have on our blog?

Each of these decisions will generate future costs and opportunities, shaping how the business operates.

Somewhere in this process, a URL structure will be decided on. Hopefully it will be logical, but the context in which it’s created is different from how it ends up being used.

As a business grows, the desire for more information and better analytics grows. We hire data analysts and pay agencies thousands of dollars to go out, gather this data, and wrangle it into a useful format so that smart business decisions can be made.

It’s too late. You’ve already wasted £1000s a year.

It’s already too late; by this point, you’ve already created hours and hours of extra work for the people who have to analyze your data and thousands will be wasted.

All because no one structured the URLs with data gathering in mind.

How about an example?

Let’s go back to the problem we talked about at the start, but go through the whole story. An e-commerce company goes to an agency and asks them to get total organic sessions to all of their product pages. They want to measure performance over time.

Now this company was very diligent when they made their site. They’d read Moz and hired an SEO agency when they designed their website and so they’d read this piece of advice: products need to sit at the root. (E.g. mysite.com/white-t-shirt.)

Apparently a lot of websites read this piece of advice, because with minimal searching you can find plenty of sites whose product pages that rank do sit at the root: Appleyard Flowers, Game, Tesco Direct.

At one level it makes sense: a product might be in multiple categories (LCD & 42” TVs, for example), so you want to avoid duplicate content. Plus, if you changed the categories, you wouldn’t want to have to redirect all the products.

But from a data gathering point of view, this is awful. Why? There is now no way in Google Analytics to select all the products unless we had the foresight to set up something earlier, like a custom dimension or content grouping. There is nothing that separates the product URLs from any other URL we might have at the root.

How could our hypothetical data analyst get the data at this point?

They might have to crawl all the pages on the site so they can pick them out with an HTML footprint (a particular piece of HTML on a page that identifies the template), or get an internal list from whoever owns the data in the organization. Once they’ve got all the product URLs, they’ll then have to match this data to the Google Analytics in Excel, probably with a VLOOKUP or, if the data set is too large, a database.

Shoot. This is starting to sound quite expensive.

And of course, if you want to do this analysis regularly, that list will constantly change. The range of products being sold will change. So it will need to be a scheduled scrape or automated report. If we go the scraping route, we could do this, but crawling regularly isn’t possible with Screaming Frog. Now we’re either spending regular time on Screaming Frog or paying for a cloud crawler that you can schedule. If we go the other route, we could have a dev build us an internal automated report we can go to once we can get the resource internally.

Wow, now this is really expensive: a couple days’ worth of dev time, or a recurring job for your SEO consultant or data analyst each week.

This could’ve been a couple of clicks on a default report.

If we have the foresight to put all the products in a folder called /products/, this entire lengthy process becomes one step:

Load the landing pages report in Google Analytics and filter for URLs beginning with /product/.

Congratulations — you’ve just cut a couple days off your agency fee, saved valuable dev time, or gained the ability to fire your second data analyst because your first is now so damn efficient (sorry, second analysts).

As a data analyst or SEO consultant, you continually bump into these kinds of issues, which suck up time and turn quick tasks into endless chores.

What is unique about a URL?

For most analytics services, it’s the main piece of information you can use to identify the page. Google Analytics, Google Search Console, log files, all of these only have access to the URL most of the time and in some cases that’s all you’ll get — you can never change this.

The vast majority of site analyses requires working with templates and generalizing across groups of similar pages. You need to work with templates and you need to be able to do this by URL.

It’s crucial.

There’s a Jeff Bezos saying that’s appropriate here:

“There are two types of decisions. Type 1 decisions are not reversible, and you have to be very careful making them. Type 2 decisions are like walking through a door — if you don’t like the decision, you can always go back.”

Setting URLs is very much a Type 1 decision. As anyone in SEO knows, you really don’t want to be constantly changing URLs; it causes a lot of problems, so when they’re being set up we need to take our time.

How should you set up your URLs?

How do you pick good URL patterns?

First, let’s define a good pattern. A good pattern is something which we can use to easily select a template of URLs, ideally using contains rather than any complicated regex.

This usually means we’re talking about adding folders because they’re easiest to find with just a contains filter, i.e. /products/, /blogs/, etc.

We also want to keep things human-readable when possible, so we need to bear that in mind when choosing our folders.

So where should we add folders to our URLs?

I always ask the following two questions:

  • Will I need to group the pages in this template together?
    • If a set of pages needs grouping I need to put them in the same folder, so we can identify this by URL.
  • Are there crucial sub-groupings for this set of pages? If there are, are they mutually exclusive and how often might they change?
    • If there are common groupings I may want to make, then I should consider putting this in the URL, unless those data groupings are liable to change.

Let’s look at a couple examples.

Firstly, back to our product example: let’s suppose we’re setting up product URLs for a fashion e-commerce store.

Will I need to group the products together? Yes, almost certainly. There clearly needs to be a way of grouping in the URL. We should put them in a /product/ folder.

Within in this template, how might I need to group these URLs together? The most plausible grouping for products is the product category. Let’s take a black midi dress.

What about putting “little black dress” or “midi” as a category? Well, are they mutually exclusive? Our dress could fit in the “little black dress” category and the “midi dress” category, so that’s probably not something we should add as a folder in the URL.

What about moving up a level and using “dress” as a category? Now that is far more suitable, if we could reasonably split all our products into:

  • Dresses
  • Tops
  • Skirts
  • Trousers
  • Jeans

And if we were happy with having jeans and trousers separate then this might indeed be an excellent fit that would allow us to easily measure the performance of each top-level category. These also seem relatively unlikely to change and, as long as we’re happy having this type of hierarchy at the top (as opposed to, say, “season,” for example), it makes a lot of sense.

What are some common URL patterns people should use?

Product pages

We’ve banged on about this enough and gone through the example above. Stick your products in a /products/ folder.

Articles

Applying the same rules we talked about to articles and two things jump out. The first is top-level categorization.

For example, adding in the following folders would allow you to easily measure the top-level performance of articles:

  • Travel
  • Sports
  • News

You should, of course, be keeping them all in a /blog/ or /guides/ etc. folder too, because you won’t want to group just by category.

Here’s an example of all 3:

  • A bad blog article URL: example.com/this-is-an-article-name/
  • A better blog article URL: example.com/blog/this-is-an-article-name/
  • An even better blog article URL: example.com/blog/sports/this-is-an-article-name

The second, which obeys all our rules, is author groupings, which may be well-suited for editorial sites with a large number of authors that they want performance stats on.

Location grouping

Many types of websites often have category pages per location. For example:

  • Cars for sale in Manchester – /for-sale/vehicles/manchester
  • Cars for sale in Birmingham. – /for-sale/vehicles/birmingham

However, there are many different levels of location granularity. For example, here are 4 different URLs, each a more specific location in the one above it (sorry to all our non-UK readers — just run with me here).

  • Cars for sale in Suffolk – /for-sale/vehicles/suffolk
  • Cars for sale in Ipswich – /for-sale/vehicles/ipswich
  • Cars for sale in Ipswich center – /for-sale/vehicles/ipswich-center
  • Cars for sale on Lancaster road – /for-sale/vehicles/lancaster-road

Obviously every site will have different levels of location granularity, but a grouping often missing here is providing the level of location granularity in the URL. For example:

  • Cars for sale in Suffolk – /for-sale/cars/county/suffolk
  • Cars for sale in Ipswich – /for-sale/vehicles/town/ipswich
  • Cars for sale in Ipswich center – /for-sale/vehicles/area/ipswich-center
  • Cars for sale on Lancaster road – /for-sale/vehicles/street/lancaster-road

This could even just be numbers (although this is less ideal because it breaks our second rule):

  • Cars for sale in Suffolk – /for-sale/vehicles/04/suffolk
  • Cars for sale in Ipswich – /for-sale/vehicles/03/ipswich
  • Cars for sale in Ipswich center – /for-sale/vehicles/02/ipswich-center
  • Cars for sale on Lancaster road – /for-sale/vehicles/01/lancaster-road

This makes it very easy to assess and measure the performance of each layer so you can understand if it’s necessary, or if perhaps you’ve aggregated too much.

What other good (or bad) examples of this has the community come across? Let’s here it!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

SearchCap: Google AMP URLs, Google Maps traffic & Super Bowl search ads

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Google AMP URLs, Google Maps traffic & Super Bowl search ads appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

SearchCap: Google & Hillary Clinton, SEO URLs & AdWords explained

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Google & Hillary Clinton, SEO URLs & AdWords explained appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Advert