Tag Archive | "Technical"

How to Strategically Think About Technical SEO – Whiteboard Friday

Posted by BenjaminEstes

We’ve all agreed that technical SEO is integral, and many of us know at least a little bit about the subject if we’re not already practitioners. But have you considered that the way you think about technical SEO could be hindering or helping your success? Today, Ben Estes from Distilled shares the agency’s tried-and-true framework for tackling technical SEO quandaries strategically.

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Hi. Welcome to another Whiteboard Friday. My name is Ben, and I’m a principal consultant at a company called Distilled. Today I’d like to talk to you about how we think about technical SEO at Distilled. Now, technical SEO is something that a lot of people know a lot of stuff about.

You accumulate knowledge over time from a lot of different sources, and that’s where a lot of the value that we deliver comes from. But not everyone can think about technical SEO from a strategic perspective, and that’s the skill that I think we should talk more about. 

Framing the problem

Let’s start by framing the problem. So look at these charts. Now, I would argue that most people’s mental model of technical SEO matches this first chart.

So in this chart, the solid black line is the actual traffic that you’re getting, whereas the dotted line is the hypothetical traffic you could be getting if all of the technical problems on your site were resolved. So some people see this and say, “Well, you know, if I can just keep fixing technical things, I can keep getting more traffic to my site.”

That’s one way of looking at it, but I would argue that it’s not the best way of looking at it, because really there are only so many technical things that can go wrong with your site. There’s a finite number of problems. It’s not an opportunity so much as an issue that needs to be resolved. So what I try and encourage my clients and colleagues to do is think about it in this way.

So it’s the same chart and the same situation. Here’s the actual traffic that you’re getting and the hypothetical traffic you could be getting. But really what’s happening is your technical problems are keeping you from realizing the most potential traffic that you could be capturing. In other words, there are technical issues preventing us from capturing all the traffic that we could. Now, once you’ve framed the problem in this way, how do you solve it?

So some people just say, “Well, I’ve got this big problem. I need to understand how all the things that could be wrong with this site. I’m just going to dive in. I’m going to go through page by page, and I’ll finish when either I run out of pages or more realistically I run out of time or I run out of the client’s budget. So what if there’s a better way to actually solve that problem and know that it’s been solved?

Well, that’s what this framework that I’m going to present to you is about. The way that we would recommend doing that is by taking the big problem, the overall problem of technical SEO and breaking it down into subproblems and breaking those down again until you have problems that are so small that they are trivially solvable. Now, I’m going to explain to you exactly how we accomplish that, and it’s going to be a little bit abstract.

The approach

So if you want something concrete to follow along with, I’d recommend checking out the blog post at this URL. That’s dis.tl/tech-audit. Okay. So when you have a big problem that you’re trying to break down, many people’s first attempt winds up looking something like this Venn diagram. So we take one problem, break it down into three subproblems, but there’s some sort of overlap between those problems.

Once there’s overlap, you lose a lot of confidence. There is, are you duplicating effort across these different areas? Or did you miss something because these two things are kind of the same? Everything just gets a little hazy very quickly. So to get past that, what I’ve used at Distilled is this consulting concept called MECE.

Mutually exclusive and comprehensively exhaustive

MECE stands for mutually exclusive and comprehensively exhaustive. That’s a lot of fancy words, so I’ll show you pictorially what I mean. So instead of having a Venn diagram like this, what if each of the problems was completely independent? Now they still cover the same area. There’s just no overlap between them, and that’s what MECE means.

Because there is no overlap between them, they are mutually exclusive. Because they cover all of the original problem, they’re comprehensively exhaustive. So what does this mean in technical SEO specifically? Now remember the problem that we’re dealing with is that there are technical issues preventing us from capturing traffic that we would otherwise be able to. So what are the three ways that that could happen? 

  1. Maybe our content isn’t being indexed. There’s a technical reason our content isn’t being indexed. 
  2. Our content doesn’t rank as well as it could, and therefore we’re losing this traffic. 
  3. There is a technical reason our content isn’t being presented as well as it could be in the SERPs.

This is things like having rich snippets, stars, things like that that could increase click-through rate. These things seem kind of trivial, but actually all of the technical problems that you can find on your site contribute to one or more of these three categories. So again, that was pretty abstract. So let’s talk about an example of how that actually plays out. This is actually the first technical check in this audit at that blog post.

An example

So, for instance, we’re starting by considering there is a technical reason our content isn’t being indexed. Well, what are all the ways that that could happen? One of the ways is that URLs are not discoverable by crawlers, and, again, that is a whole thing in itself that can be broken down further.

So maybe it’s that our XML sitemaps aren’t uploaded to Google Search Console. Of course, this isn’t a guarantee that we have a problem. But if there’s a problem down here, there’s a pretty good chance that that trickles back up to a problem up here that we’re really concerned about. The beauty of this isn’t just that it winds up helping us create a checklist so that we know all of the technical issues we ought to be looking at.



But it also helps us convey exactly what the meaning is of our findings and why people should care about them. So this is the template that I encourage my colleagues to use at Distilled. “We are seeing ________. This is a problem because something.You should care about that because something else.” The way this works is like Mad Lib style, except we work like inside out.

So we start with this point here. We are seeing that our XML sitemaps aren’t uploaded to Google Search Console. This is a problem because maybe URLs are not discoverable by crawlers. We should care about that because there is a technical reason our content isn’t being indexed, and that right there is exactly the message that you deliver to your client.

So again, this is exactly the framework that we use for our technical audits at Distilled. It’s given us a lot more confidence. It’s given us a lot more insight into how long this process should take for our analysts and consultants, and it’s also got us better outcomes particularly because it’s helped us communicate better about what we found. Thank you very much. I would love if more people use this, and feel free to reach out to me personally if you have any thoughts or questions.

Thank you.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Evergreen Googlebot with Chromium rendering engine: What technical SEOs need to know

Googlebot now supports many more features and will make it easier for developers to ensure their sites work with Googlebot.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

The One-Hour Guide to SEO: Technical SEO – Whiteboard Friday

Posted by randfish

We’ve arrived at one of the meatiest SEO topics in our series: technical SEO. In this fifth part of the One-Hour Guide to SEO, Rand covers essential technical topics from crawlability to internal link structure to subfolders and far more. Watch on for a firmer grasp of technical SEO fundamentals!

Click on the whiteboard image above to open a high resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome back to our special One-Hour Guide to SEO Whiteboard Friday series. This is Part V – Technical SEO. I want to be totally upfront. Technical SEO is a vast and deep discipline like any of the things we’ve been talking about in this One-Hour Guide.

There is no way in the next 10 minutes that I can give you everything that you’ll ever need to know about technical SEO, but we can cover many of the big, important, structural fundamentals. So that’s what we’re going to tackle today. You will come out of this having at least a good idea of what you need to be thinking about, and then you can go explore more resources from Moz and many other wonderful websites in the SEO world that can help you along these paths.

1. Every page on the website is unique & uniquely valuable

First off, every page on a website should be two things — unique, unique from all the other pages on that website, and uniquely valuable, meaning it provides some value that a user, a searcher would actually desire and want. Sometimes the degree to which it’s uniquely valuable may not be enough, and we’ll need to do some intelligent things.

So, for example, if we’ve got a page about X, Y, and Z versus a page that’s sort of, “Oh, this is a little bit of a combination of X and Y that you can get through searching and then filtering this way.Oh, here’s another copy of that XY, but it’s a slightly different version.Here’s one with YZ. This is a page that has almost nothing on it, but we sort of need it to exist for this weird reason that has nothing to do, but no one would ever want to find it through search engines.”

Okay, when you encounter these types of pages as opposed to these unique and uniquely valuable ones, you want to think about: Should I be canonicalizing those, meaning point this one back to this one for search engine purposes? Maybe YZ just isn’t different enough from Z for it to be a separate page in Google’s eyes and in searchers’ eyes. So I’m going to use something called the rel=canonical tag to point this YZ page back to Z.

Maybe I want to remove these pages. Oh, this is totally non-valuable to anyone. 404 it. Get it out of here. Maybe I want to block bots from accessing this section of our site. Maybe these are search results that make sense if you’ve performed this query on our site, but they don’t make any sense to be indexed in Google. I’ll keep Google out of it using the robots.txt file or the meta robots or other things.

2. Pages are accessible to crawlers, load fast, and can be fully parsed in a text-based browser

Secondarily, pages are accessible to crawlers. They should be accessible to crawlers. They should load fast, as fast as you possibly can. There’s a ton of resources about optimizing images and optimizing server response times and optimizing first paint and first meaningful paint and all these different things that go into speed.

But speed is good not only because of technical SEO issues, meaning Google can crawl your pages faster, which oftentimes when people speed up the load speed of their pages, they find that Google crawls more from them and crawls them more frequently, which is a wonderful thing, but also because pages that load fast make users happier. When you make users happier, you make it more likely that they will link and amplify and share and come back and keep loading and not click the back button, all these positive things and avoiding all these negative things.

They should be able to be fully parsed in essentially a text browser, meaning that if you have a relatively unsophisticated browser that is not doing a great job of processing JavaScript or post-loading of script events or other types of content, Flash and stuff like that, it should be the case that a spider should be able to visit that page and still see all of the meaningful content in text form that you want to present.

Google still is not processing every image at the I’m going to analyze everything that’s in this image and extract out the text from it level, nor are they doing that with video, nor are they doing that with many kinds of JavaScript and other scripts. So I would urge you and I know many other SEOs, notably Barry Adams, a famous SEO who says that JavaScript is evil, which may be taking it a little bit far, but we catch his meaning, that you should be able to load everything into these pages in HTML in text.

3. Thin content, duplicate content, spider traps/infinite loops are eliminated

Thin content and duplicate content — thin content meaning content that doesn’t provide meaningfully useful, differentiated value, and duplicate content meaning it’s exactly the same as something else — spider traps and infinite loops, like calendaring systems, these should generally speaking be eliminated. If you have those duplicate versions and they exist for some reason, for example maybe you have a printer-friendly version of an article and the regular version of the article and the mobile version of the article, okay, there should probably be some canonicalization going on there, the rel=canonical tag being used to say this is the original version and here’s the mobile friendly version and those kinds of things.

If you have search results in the search results, Google generally prefers that you don’t do that. If you have slight variations, Google would prefer that you canonicalize those, especially if the filters on them are not meaningfully and usefully different for searchers. 

4. Pages with valuable content are accessible through a shallow, thorough internal links structure

Number four, pages with valuable content on them should be accessible through just a few clicks, in a shallow but thorough internal link structure.

Now this is an idealized version. You’re probably rarely going to encounter exactly this. But let’s say I’m on my homepage and my homepage has 100 links to unique pages on it. That gets me to 100 pages. One hundred more links per page gets me to 10,000 pages, and 100 more gets me to 1 million.

So that’s only three clicks from homepage to one million pages. You might say, “Well, Rand, that’s a little bit of a perfect pyramid structure. I agree. Fair enough. Still, three to four clicks to any page on any website of nearly any size, unless we’re talking about a site with hundreds of millions of pages or more, should be the general rule. I should be able to follow that through either a sitemap.

If you have a complex structure and you need to use a sitemap, that’s fine. Google is fine with you using an HTML page-level sitemap. Or alternatively, you can just have a good link structure internally that gets everyone easily, within a few clicks, to every page on your site. You don’t want to have these holes that require, “Oh, yeah, if you wanted to reach that page, you could, but you’d have to go to our blog and then you’d have to click back to result 9, and then you’d have to click to result 18 and then to result 27, and then you can find it.”

No, that’s not ideal. That’s too many clicks to force people to make to get to a page that’s just a little ways back in your structure. 

5. Pages should be optimized to display cleanly and clearly on any device, even at slow connection speeds

Five, I think this is obvious, but for many reasons, including the fact that Google considers mobile friendliness in its ranking systems, you want to have a page that loads clearly and cleanly on any device, even at slow connection speeds, optimized for both mobile and desktop, optimized for 4G and also optimized for 2G and no G.

6. Permanent redirects should use the 301 status code, dead pages the 404, temporarily unavailable the 503, and all okay should use the 200 status code

Permanent redirects. So this page was here. Now it’s over here. This old content, we’ve created a new version of it. Okay, old content, what do we do with you? Well, we might leave you there if we think you’re valuable, but we may redirect you. If you’re redirecting old stuff for any reason, it should generally use the 301 status code.

If you have a dead page, it should use the 404 status code. You could maybe sometimes use 410, permanently removed, as well. Temporarily unavailable, like we’re having some downtime this weekend while we do some maintenance, 503 is what you want. Everything is okay, everything is great, that’s a 200. All of your pages that have meaningful content on them should have a 200 code.

These status codes, anything else beyond these, and maybe the 410, generally speaking should be avoided. There are some very occasional, rare, edge use cases. But if you find status codes other than these, for example if you’re using Moz, which crawls your website and reports all this data to you and does this technical audit every week, if you see status codes other than these, Moz or other software like it, Screaming Frog or Ryte or DeepCrawl or these other kinds, they’ll say, “Hey, this looks problematic to us. You should probably do something about this.”

7. Use HTTPS (and make your site secure)

When you are building a website that you want to rank in search engines, it is very wise to use a security certificate and to have HTTPS rather than HTTP, the non-secure version. Those should also be canonicalized. There should never be a time when HTTP is the one that is loading preferably. Google also gives a small reward — I’m not even sure it’s that small anymore, it might be fairly significant at this point — to pages that use HTTPS or a penalty to those that don’t. 

8. One domain > several, subfolders > subdomains, relevant folders > long, hyphenated URLs

In general, well, I don’t even want to say in general. It is nearly universal, with a few edge cases — if you’re a very advanced SEO, you might be able to ignore a little bit of this — but it is generally the case that you want one domain, not several. Allmystuff.com, not allmyseattlestuff.com, allmyportlandstuff.com, and allmylastuff.com.

Allmystuff.com is preferable for many, many technical reasons and also because the challenge of ranking multiple websites is so significant compared to the challenge of ranking one. 

You want subfolders, not subdomains, meaning I want allmystuff.com/seattle, /la, and /portland, not seattle.allmystuff.com.

Why is this? Google’s representatives have sometimes said that it doesn’t really matter and I should do whatever is easy for me. I have so many cases over the years, case studies of folks who moved from a subdomain to a subfolder and saw their rankings increase overnight. Credit to Google’s reps.

I’m sure they’re getting their information from somewhere. But very frankly, in the real world, it just works all the time to put it in a subfolder. I have never seen a problem being in the subfolder versus the subdomain, where there are so many problems and there are so many issues that I would strongly, strongly urge you against it. I think 95% of professional SEOs, who have ever had a case like this, would do likewise.

Relevant folders should be used rather than long, hyphenated URLs. This is one where we agree with Google. Google generally says, hey, if you have allmystuff.com/seattle/ storagefacilities/top10places, that is far better than /seattle- storage-facilities-top-10-places. It’s just the case that Google is good at folder structure analysis and organization, and users like it as well and good breadcrumbs come from there.

There’s a bunch of benefits. Generally using this folder structure is preferred to very, very long URLs, especially if you have multiple pages in those folders. 

9. Use breadcrumbs wisely on larger/deeper-structured sites

Last, but not least, at least last that we’ll talk about in this technical SEO discussion is using breadcrumbs wisely. So breadcrumbs, actually both technical and on-page, it’s good for this.

Google generally learns some things from the structure of your website from using breadcrumbs. They also give you this nice benefit in the search results, where they show your URL in this friendly way, especially on mobile, mobile more so than desktop. They’ll show home > seattle > storage facilities. Great, looks beautiful. Works nicely for users. It helps Google as well.

So there are plenty more in-depth resources that we can go into on many of these topics and others around technical SEO, but this is a good starting point. From here, we will take you to Part VI, our last one, on link building next week. Take care.

Video transcription by Speechpad.com

In case you missed them:

Check out the other episodes in the series so far:

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Rewriting the Beginner’s Guide to SEO, Chapter 5: Technical Optimization

Posted by BritneyMuller

After a short break, we’re back to share our working draft of Chapter 5 of the Beginner’s Guide to SEO with you! This one was a whopper, and we’re really looking forward to your input. Giving beginner SEOs a solid grasp of just what technical optimization for SEO is and why it matters — without overwhelming them or scaring them off the subject — is a tall order indeed. We’d love to hear what you think: did we miss anything you think is important for beginners to know? Leave us your feedback in the comments!

And in case you’re curious, check back on our outline, Chapter One, Chapter Two, Chapter Three, and Chapter Four to see what we’ve covered so far.


Chapter 5: Technical Optimization

Basic technical knowledge will help you optimize your site for search engines and establish credibility with developers.

Now that you’ve crafted valuable content on the foundation of solid keyword research, it’s important to make sure it’s not only readable by humans, but by search engines too!

You don’t need to have a deep technical understanding of these concepts, but it is important to grasp what these technical assets do so that you can speak intelligently about them with developers. Speaking your developers’ language is important because you will likely need them to carry out some of your optimizations. They’re unlikely to prioritize your asks if they can’t understand your request or see its importance. When you establish credibility and trust with your devs, you can begin to tear away the red tape that often blocks crucial work from getting done.

Pro tip: SEOs need cross-team support to be effective

It’s vital to have a healthy relationship with your developers so that you can successfully tackle SEO challenges from both sides. Don’t wait until a technical issue causes negative SEO ramifications to involve a developer. Instead, join forces for the planning stage with the goal of avoiding the issues altogether. If you don’t, it can cost you in time and money later.

Beyond cross-team support, understanding technical optimization for SEO is essential if you want to ensure that your web pages are structured for both humans and crawlers. To that end, we’ve divided this chapter into three sections:

  1. How websites work
  2. How search engines understand websites
  3. How users interact with websites

Since the technical structure of a site can have a massive impact on its performance, it’s crucial for everyone to understand these principles. It might also be a good idea to share this part of the guide with your programmers, content writers, and designers so that all parties involved in a site’s construction are on the same page.

1. How websites work

If search engine optimization is the process of optimizing a website for search, SEOs need at least a basic understanding of the thing they’re optimizing!

Below, we outline the website’s journey from domain name purchase all the way to its fully rendered state in a browser. An important component of the website’s journey is the critical rendering path, which is the process of a browser turning a website’s code into a viewable page.

Knowing this about websites is important for SEOs to understand for a few reasons:

  • The steps in this webpage assembly process can affect page load times, and speed is not only important for keeping users on your site, but it’s also one of Google’s ranking factors.
  • Google renders certain resources, like JavaScript, on a “second pass.” Google will look at the page without JavaScript first, then a few days to a few weeks later, it will render JavaScript, meaning SEO-critical elements that are added to the page using JavaScript might not get indexed.

Imagine that the website loading process is your commute to work. You get ready at home, gather your things to bring to the office, and then take the fastest route from your home to your work. It would be silly to put on just one of your shoes, take a longer route to work, drop your things off at the office, then immediately return home to get your other shoe, right? That’s sort of what inefficient websites do. This chapter will teach you how to diagnose where your website might be inefficient, what you can do to streamline, and the positive ramifications on your rankings and user experience that can result from that streamlining.

Before a website can be accessed, it needs to be set up!

  1. Domain name is purchased. Domain names like moz.com are purchased from a domain name registrar such as GoDaddy or HostGator. These registrars are just organizations that manage the reservations of domain names.
  2. Domain name is linked to IP address. The Internet doesn’t understand names like “moz.com” as website addresses without the help of domain name servers (DNS). The Internet uses a series of numbers called an Internet protocol (IP) address (ex: 127.0.0.1), but we want to use names like moz.com because they’re easier for humans to remember. We need to use a DNS to link those human-readable names with machine-readable numbers.

How a website gets from server to browser

  1. User requests domain. Now that the name is linked to an IP address via DNS, people can request a website by typing the domain name directly into their browser or by clicking on a link to the website.
  2. Browser makes requests. That request for a web page prompts the browser to make a DNS lookup request to convert the domain name to its IP address. The browser then makes a request to the server for the code your web page is constructed with, such as HTML, CSS, and JavaScript.
  3. Server sends resources. Once the server receives the request for the website, it sends the website files to be assembled in the searcher’s browser.
  4. Browser assembles the web page. The browser has now received the resources from the server, but it still needs to put it all together and render the web page so that the user can see it in their browser. As the browser parses and organizes all the web page’s resources, it’s creating a Document Object Model (DOM). The DOM is what you can see when you right click + “inspect element” on a web page in your Chrome browser (learn how to inspect elements in other browsers).
  5. Browser makes final requests. The browser will only show a web page after all the page’s necessary code is downloaded, parsed, and executed, so at this point, if the browser needs any additional code in order to show your website, it will make an additional request from your server.
  6. Website appears in browser. Whew! After all that, your website has now been transformed (rendered) from code to what you see in your browser.

Pro tip: Talk to your developers about async!

Something you can bring up with your developers is shortening the critical rendering path by setting scripts to “async” when they’re not needed to render content above the fold, which can make your web pages load faster. Async tells the DOM that it can continue to be assembled while the browser is fetching the scripts needed to display your web page. If the DOM has to pause assembly every time the browser fetches a script (called “render-blocking scripts”), it can substantially slow down your page load.

It would be like going out to eat with your friends and having to pause the conversation every time one of you went up to the counter to order, only resuming once they got back. With async, you and your friends can continue to chat even when one of you is ordering. You might also want to bring up other optimizations that devs can implement to shorten the critical rendering path, such as removing unnecessary scripts entirely, like old tracking scripts.

Now that you know how a website appears in a browser, we’re going to focus on what a website is made of — in other words, the code (programming languages) used to construct those web pages.

The three most common are:

  • HTML – What a website says (titles, body content, etc.)
  • CSS – How a website looks (color, fonts, etc.)
  • JavaScript – How it behaves (interactive, dynamic, etc.)

HTML: What a website says

HTML stands for hypertext markup language, and it serves as the backbone of a website. Elements like headings, paragraphs, lists, and content are all defined in the HTML.

Here’s an example of a webpage, and what its corresponding HTML looks like:

HTML is important for SEOs to know because it’s what lives “under the hood” of any page they create or work on. While your CMS likely doesn’t require you to write your pages in HTML (ex: selecting “hyperlink” will allow you to create a link without you having to type in “a href=”), it is what you’re modifying every time you do something to a web page such as adding content, changing the anchor text of internal links, and so on. Google crawls these HTML elements to determine how relevant your document is to a particular query. In other words, what’s in your HTML plays a huge role in how your web page ranks in Google organic search!

CSS: How a website looks

CSS stands for cascading style sheets, and this is what causes your web pages to take on certain fonts, colors, and layouts. HTML was created to describe content, rather than to style it, so when CSS entered the scene, it was a game-changer. With CSS, web pages could be “beautified” without requiring manual coding of styles into the HTML of every page — a cumbersome process, especially for large sites.

It wasn’t until 2014 that Google’s indexing system began to render web pages more like an actual browser, as opposed to a text-only browser. A black-hat SEO practice that tried to capitalize on Google’s older indexing system was hiding text and links via CSS for the purpose of manipulating search engine rankings. This “hidden text and links” practice is a violation of Google’s quality guidelines.

Components of CSS that SEOs, in particular, should care about:

  • Since style directives can live in external stylesheet files (CSS files) instead of your page’s HTML, it makes your page less code-heavy, reducing file transfer size and making load times faster.
  • Browsers still have to download resources like your CSS file, so compressing them can make your web pages load faster, and page speed is a ranking factor.
  • Having your pages be more content-heavy than code-heavy can lead to better indexing of your site’s content.
  • Using CSS to hide links and content can get your website manually penalized and removed from Google’s index.

JavaScript: How a website behaves

In the earlier days of the Internet, web pages were built with HTML. When CSS came along, webpage content had the ability to take on some style. When the programming language JavaScript entered the scene, websites could now not only have structure and style, but they could be dynamic.

JavaScript has opened up a lot of opportunities for non-static web page creation. When someone attempts to access a page that is enhanced with this programming language, that user’s browser will execute the JavaScript against the static HTML that the server returned, resulting in a web page that comes to life with some sort of interactivity.

You’ve definitely seen JavaScript in action — you just may not have known it! That’s because JavaScript can do almost anything to a page. It could create a pop up, for example, or it could request third-party resources like ads to display on your page.

JavaScript can pose some problems for SEO, though, since search engines don’t view JavaScript the same way human visitors do. That’s because of client-side versus server-side rendering. Most JavaScript is executed in a client’s browser. With server-side rendering, on the other hand, the files are executed at the server and the server sends them to the browser in their fully rendered state.

SEO-critical page elements such as text, links, and tags that are loaded on the client’s side with JavaScript, rather than represented in your HTML, are invisible from your page’s code until they are rendered. This means that search engine crawlers won’t see what’s in your JavaScript — at least not initially.

Google says that, as long as you’re not blocking Googlebot from crawling your JavaScript files, they’re generally able to render and understand your web pages just like a browser can, which means that Googlebot should see the same things as a user viewing a site in their browser. However, due to this “second wave of indexing” for client-side JavaScript, Google can miss certain elements that are only available once JavaScript is executed.

There are also some other things that could go wrong during Googlebot’s process of rendering your web pages, which can prevent Google from understanding what’s contained in your JavaScript:

  • You’ve blocked Googlebot from JavaScript resources (ex: with robots.txt, like we learned about in Chapter 2)
  • Your server can’t handle all the requests to crawl your content
  • The JavaScript is too complex or outdated for Googlebot to understand
  • JavaScript doesn’t “lazy load” content into the page until after the crawler has finished with the page and moved on.

Needless to say, while JavaScript does open a lot of possibilities for web page creation, it can also have some serious ramifications for your SEO if you’re not careful. Thankfully, there is a way to check whether Google sees the same thing as your visitors. To see a page how Googlebot views your page, use Google Search Console’s “Fetch and Render” tool. From your site’s Google Search Console dashboard, select “Crawl” from the left navigation, then “Fetch as Google.”

From this page, enter the URL you want to check (or leave blank if you want to check your homepage) and click the “Fetch and Render” button. You also have the option to test either the desktop or mobile version.

In return, you’ll get a side-by-side view of how Googlebot saw your page versus how a visitor to your website would have seen the page. Below, Google will also show you a list of any resources they may not have been able to get for the URL you entered.

Understanding the way websites work lays a great foundation for what we’ll talk about next, which is technical optimizations to help Google understand the pages on your website better.

2. How search engines understand websites

Search engines have gotten incredibly sophisticated, but they can’t (yet) find and interpret web pages quite like a human can. The following sections outline ways you can better deliver content to search engines.

Help search engines understand your content by structuring it with Schema

Imagine being a search engine crawler scanning down a 10,000-word article about how to bake a cake. How do you identify the author, recipe, ingredients, or steps required to bake a cake? This is where schema (Schema.org) markup comes in. It allows you to spoon-feed search engines more specific classifications for what type of information is on your page.

Schema is a way to label or organize your content so that search engines have a better understanding of what certain elements on your web pages are. This code provides structure to your data, which is why schema is often referred to as “structured data.” The process of structuring your data is often referred to as “markup” because you are marking up your content with organizational code.

JSON-LD is Google’s preferred schema markup (announced in May ‘16), which Bing also supports. To view a full list of the thousands of available schema markups, visit Schema.org or view the Google Developers Introduction to Structured Data for additional information on how to implement structured data. After you implement the structured data that best suits your web pages, you can test your markup with Google’s Structured Data Testing Tool.

In addition to helping bots like Google understand what a particular piece of content is about, schema markup can also enable special features to accompany your pages in the SERPs. These special features are referred to as “rich snippets,” and you’ve probably seen them in action. They’re things like:

  • Top Stories carousel
  • Review stars
  • Sitelinks search boxes
  • Recipes

Remember, using structured data can help enable a rich snippet to be present, but does not guarantee it. Other types of rich snippets will likely be added in the future as the use of schema markup increases.

Some last words of advice for schema success:

  • You can use multiple types of schema markup on a page. However, if you mark up one element, like a product for example, and there are other products listed on the page, you must also mark up those products.
  • Don’t mark up content that is not visible to visitors and follow Google’s Quality Guidelines. For example, if you add review structured markup to a page, make sure those reviews are actually visible on that page.
  • If you have duplicate pages, Google asks that you mark up each duplicate page with your structured markup, not just the canonical version.
  • Provide original and updated (if applicable) content on your structured data pages.
  • Structured markup should be an accurate reflection of your page.
  • Try to use the most specific type of schema markup for your content.
  • Marked-up reviews should not be written by the business. They should be genuine unpaid business reviews from actual customers.

Tell search engines about your preferred pages with canonicalization

When Google crawls the same content on different web pages, it sometimes doesn’t know which page to index in search results. This is why the tag was invented: to help search engines better index the preferred version of content and not all its duplicates.

The rel=”canonical” tag allows you to tell search engines where the original, master version of a piece of content is located. You’re essentially saying, “Hey search engine! Don’t index this; index this source page instead.” So, if you want to republish a piece of content, whether exactly or slightly modified, but don’t want to risk creating duplicate content, the canonical tag is here to save the day.

Proper canonicalization ensures that every unique piece of content on your website has only one URL. To prevent search engines from indexing multiple versions of a single page, Google recommends having a self-referencing canonical tag on every page on your site. Without a canonical tag telling Google which version of your web page is the preferred one, http://www.example.com could get indexed separately from http://example.com, creating duplicates.

“Avoid duplicate content” is an Internet truism, and for good reason! Google wants to reward sites with unique, valuable content — not content that’s taken from other sources and repeated across multiple pages. Because engines want to provide the best searcher experience, they will rarely show multiple versions of the same content, opting instead to show only the canonicalized version, or if a canonical tag does not exist, whichever version they deem most likely to be the original.

Pro tip: Distinguishing between content filtering & content penalties
There is no such thing as a duplicate content penalty. However, you should try to keep duplicate content from causing indexing issues by using the rel=”canonical” tag when possible. When duplicates of a page exist, Google will choose a canonical and filter the others out of search results. That doesn’t mean you’ve been penalized. It just means that Google only wants to show one version of your content.

It’s also very common for websites to have multiple duplicate pages due to sort and filter options. For example, on an e-commerce site, you might have what’s called a faceted navigation that allows visitors to narrow down products to find exactly what they’re looking for, such as a “sort by” feature that reorders results on the product category page from lowest to highest price. This could create a URL that looks something like this: example.com/mens-shirts?sort=price_ascending. Add in more sort/filter options like color, size, material, brand, etc. and just think about all the variations of your main product category page this would create!

To learn more about different types of duplicate content, this post by Dr. Pete helps distill the different nuances.

3. How users interact with websites

In Chapter 1, we said that despite SEO standing for search engine optimization, SEO is as much about people as it is about search engines themselves. That’s because search engines exist to serve searchers. This goal helps explain why Google’s algorithm rewards websites that provide the best possible experiences for searchers, and why some websites, despite having qualities like robust backlink profiles, might not perform well in search.

When we understand what makes their web browsing experience optimal, we can create those experiences for maximum search performance.

Ensuring a positive experience for your mobile visitors

Being that well over half of all web traffic today comes from mobile, it’s safe to say that your website should be accessible and easy to navigate for mobile visitors. In April 2015, Google rolled out an update to its algorithm that would promote mobile-friendly pages over non-mobile-friendly pages. So how can you ensure that your website is mobile friendly? Although there are three main ways to configure your website for mobile, Google recommends responsive web design.

Responsive design

Responsive websites are designed to fit the screen of whatever type of device your visitors are using. You can use CSS to make the web page “respond” to the device size. This is ideal because it prevents visitors from having to double-tap or pinch-and-zoom in order to view the content on your pages. Not sure if your web pages are mobile friendly? You can use Google’s mobile-friendly test to check!

AMP

AMP stands for Accelerated Mobile Pages, and it is used to deliver content to mobile visitors at speeds much greater than with non-AMP delivery. AMP is able to deliver content so fast because it delivers content from its cache servers (not the original site) and uses a special AMP version of HTML and JavaScript. Learn more about AMP.

Mobile-first indexing

As of 2018, Google started switching websites over to mobile-first indexing. That change sparked some confusion between mobile-friendliness and mobile-first, so it’s helpful to disambiguate. With mobile-first indexing, Google crawls and indexes the mobile version of your web pages. Making your website compatible to mobile screens is good for users and your performance in search, but mobile-first indexing happens independently of mobile-friendliness.

This has raised some concerns for websites that lack parity between mobile and desktop versions, such as showing different content, navigation, links, etc. on their mobile view. A mobile site with different links, for example, will alter the way in which Googlebot (mobile) crawls your site and sends link equity to your other pages.

Breaking up long content for easier digestion

When sites have very long pages, they have the option of breaking them up into multiple parts of a whole. This is called pagination and it’s similar to pages in a book. In order to avoid giving the visitor too much all at once, you can break up your single page into multiple parts. This can be great for visitors, especially on e-commerce sites where there are a lot of product results in a category, but there are some steps you should take to help Google understand the relationship between your paginated pages. It’s called rel=”next” and rel=”prev.”

You can read more about pagination in Google’s official documentation, but the main takeaways are that:

  • The first page in a sequence should only have rel=”next” markup
  • The last page in a sequence should only have rel=”prev” markup
  • Pages that have both a preceding and following page should have both rel=”next” and rel=”prev”
  • Since each page in the sequence is unique, don’t canonicalize them to the first page in the sequence. Only use a canonical tag to point to a “view all” version of your content, if you have one.
  • When Google sees a paginated sequence, it will typically consolidate the pages’ linking properties and send searchers to the first page

Pro tip: rel=”next/prev” should still have anchor text and live within an <a> link
This helps Google ensure that they pick up the rel=”next/prev”.

Improving page speed to mitigate visitor frustration

Google wants to serve content that loads lightning-fast for searchers. We’ve come to expect fast-loading results, and when we don’t get them, we’ll quickly bounce back to the SERP in search of a better, faster page. This is why page speed is a crucial aspect of on-site SEO. We can improve the speed of our web pages by taking advantage of tools like the ones we’ve mentioned below. Click on the links to learn more about each.

Images are one of the main culprits of slow pages!

As discussed in Chapter 4, images are one of the number-one reasons for slow-loading web pages! In addition to image compression, optimizing image alt text, choosing the right image format, and submitting image sitemaps, there are other technical ways to optimize the speed and way in which images are shown to your users. Some primary ways to improve image delivery are as follows:

SRCSET: How to deliver the best image size for each device

The SRCSET attribute allows you to have multiple versions of your image and then specify which version should be used in different situations. This piece of code is added to the <img> tag (where your image is located in the HTML) to provide unique images for specific-sized devices.

This is like the concept of responsive design that we discussed earlier, except for images!

This doesn’t just speed up your image load time, it’s also a unique way to enhance your on-page user experience by providing different and optimal images to different device types.

Pro tip: There are more than just three image size versions!
It’s a common misconception that you just need a desktop, tablet, and mobile-sized version of your image. There are a huge variety of screen sizes and resolutions. Learn more about SRCSET.

Show visitors image loading is in progress with lazy loading

Lazy loading occurs when you go to a webpage and, instead of seeing a blank white space for where an image will be, a blurry lightweight version of the image or a colored box in its place appears while the surrounding text loads. After a few seconds, the image clearly loads in full resolution. The popular blogging platform Medium does this really well.

The low resolution version is initially loaded, and then the full high resolution version. This also helps to optimize your critical rendering path! So while all of your other page resources are being downloaded, you’re showing a low-resolution teaser image that helps tell users that things are happening/being loaded. For more information on how you should lazy load your images, check out Google’s Lazy Loading Guidance.

Improve speed by condensing and bundling your files

Page speed audits will often make recommendations such as “minify resource,” but what does that actually mean? Minification condenses a code file by removing things like line breaks and spaces, as well as abbreviating code variable names wherever possible.

“Bundling” is another common term you’ll hear in reference to improving page speed. The process of bundling combines a bunch of the same coding language files into one single file. For example, a bunch of JavaScript files could be put into one larger file to reduce the amount of JavaScript files for a browser.

By both minifying and bundling the files needed to construct your web page, you’ll speed up your website and reduce the number of your HTTP (file) requests.

Improving the experience for international audiences

Websites that target audiences from multiple countries should familiarize themselves with international SEO best practices in order to serve up the most relevant experiences. Without these optimizations, international visitors might have difficulty finding the version of your site that caters to them.

There are two main ways a website can be internationalized:

  • Language
    Sites that target speakers of multiple languages are considered multilingual websites. These sites should add something called an hreflang tag to show Google that your page has copy for another language. Learn more about hreflang.
  • Country
    Sites that target audiences in multiple countries are called multi-regional websites and they should choose a URL structure that makes it easy to target their domain or pages to specific countries. This can include the use of a country code top level domain (ccTLD) such as “.ca” for Canada, or a generic top-level domain (gTLD) with a country-specific subfolder such as “example.com/ca” for Canada. Learn more about locale-specific URLs.

You’ve researched, you’ve written, and you’ve optimized your website for search engines and user experience. The next piece of the SEO puzzle is a big one: establishing authority so that your pages will rank highly in search results.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

An 8-Point Checklist for Debugging Strange Technical SEO Problems

Posted by Dom-Woodman

Occasionally, a problem will land on your desk that’s a little out of the ordinary. Something where you don’t have an easy answer. You go to your brain and your brain returns nothing.

These problems can’t be solved with a little bit of keyword research and basic technical configuration. These are the types of technical SEO problems where the rabbit hole goes deep.

The very nature of these situations defies a checklist, but it’s useful to have one for the same reason we have them on planes: even the best of us can and will forget things, and a checklist will provvide you with places to dig.


Fancy some examples of strange SEO problems? Here are four examples to mull over while you read. We’ll answer them at the end.

1. Why wasn’t Google showing 5-star markup on product pages?

  • The pages had server-rendered product markup and they also had Feefo product markup, including ratings being attached client-side.
  • The Feefo ratings snippet was successfully rendered in Fetch & Render, plus the mobile-friendly tool.
  • When you put the rendered DOM into the structured data testing tool, both pieces of structured data appeared without errors.

2. Why wouldn’t Bing display 5-star markup on review pages, when Google would?

  • The review pages of client & competitors all had rating rich snippets on Google.
  • All the competitors had rating rich snippets on Bing; however, the client did not.
  • The review pages had correctly validating ratings schema on Google’s structured data testing tool, but did not on Bing.

3. Why were pages getting indexed with a no-index tag?

  • Pages with a server-side-rendered no-index tag in the head were being indexed by Google across a large template for a client.

4. Why did any page on a website return a 302 about 20–50% of the time, but only for crawlers?

  • A website was randomly throwing 302 errors.
  • This never happened in the browser and only in crawlers.
  • User agent made no difference; location or cookies also made no difference.

Finally, a quick note. It’s entirely possible that some of this checklist won’t apply to every scenario. That’s totally fine. It’s meant to be a process for everything you could check, not everything you should check.

The pre-checklist check

Does it actually matter?

Does this problem only affect a tiny amount of traffic? Is it only on a handful of pages and you already have a big list of other actions that will help the website? You probably need to just drop it.

I know, I hate it too. I also want to be right and dig these things out. But in six months’ time, when you’ve solved twenty complex SEO rabbit holes and your website has stayed flat because you didn’t re-write the title tags, you’re still going to get fired.

But hopefully that’s not the case, in which case, onwards!

Where are you seeing the problem?

We don’t want to waste a lot of time. Have you heard this wonderful saying?: “If you hear hooves, it’s probably not a zebra.”

The process we’re about to go through is fairly involved and it’s entirely up to your discretion if you want to go ahead. Just make sure you’re not overlooking something obvious that would solve your problem. Here are some common problems I’ve come across that were mostly horses.

  1. You’re underperforming from where you should be.
    1. When a site is under-performing, people love looking for excuses. Weird Google nonsense can be quite a handy thing to blame. In reality, it’s typically some combination of a poor site, higher competition, and a failing brand. Horse.
  2. You’ve suffered a sudden traffic drop.
    1. Something has certainly happened, but this is probably not the checklist for you. There are plenty of common-sense checklists for this. I’ve written about diagnosing traffic drops recently — check that out first.
  3. The wrong page is ranking for the wrong query.
    1. In my experience (which should probably preface this entire post), this is usually a basic problem where a site has poor targeting or a lot of cannibalization. Probably a horse.

Factors which make it more likely that you’ve got a more complex problem which require you to don your debugging shoes:

  • A website that has a lot of client-side JavaScript.
  • Bigger, older websites with more legacy.
  • Your problem is related to a new Google property or feature where there is less community knowledge.

1. Start by picking some example pages.

Pick a couple of example pages to work with — ones that exhibit whatever problem you’re seeing. No, this won’t be representative, but we’ll come back to that in a bit.

Of course, if it only affects a tiny number of pages then it might actually be representative, in which case we’re good. It definitely matters, right? You didn’t just skip the step above? OK, cool, let’s move on.

2. Can Google crawl the page once?

First we’re checking whether Googlebot has access to the page, which we’ll define as a 200 status code.

We’ll check in four different ways to expose any common issues:

  1. Robots.txt: Open up Search Console and check in the robots.txt validator.
  2. User agent: Open Dev Tools and verify that you can open the URL with both Googlebot and Googlebot Mobile.
    1. To get the user agent switcher, open Dev Tools.
    2. Check the console drawer is open (the toggle is the Escape key)
    3. Hit the … and open “Network conditions”
    4. Here, select your user agent!

  1. IP Address: Verify that you can access the page with the mobile testing tool. (This will come from one of the IPs used by Google; any checks you do from your computer won’t.)
  2. Country: The mobile testing tool will visit from US IPs, from what I’ve seen, so we get two birds with one stone. But Googlebot will occasionally crawl from non-American IPs, so it’s also worth using a VPN to double-check whether you can access the site from any other relevant countries.
    1. I’ve used HideMyAss for this before, but whatever VPN you have will work fine.

We should now have an idea whether or not Googlebot is struggling to fetch the page once.

Have we found any problems yet?

If we can re-create a failed crawl with a simple check above, then it’s likely Googlebot is probably failing consistently to fetch our page and it’s typically one of those basic reasons.

But it might not be. Many problems are inconsistent because of the nature of technology. ;)

3. Are we telling Google two different things?

Next up: Google can find the page, but are we confusing it by telling it two different things?

This is most commonly seen, in my experience, because someone has messed up the indexing directives.

By “indexing directives,” I’m referring to any tag that defines the correct index status or page in the index which should rank. Here’s a non-exhaustive list:

  • No-index
  • Canonical
  • Mobile alternate tags
  • AMP alternate tags

An example of providing mixed messages would be:

  • No-indexing page A
  • Page B canonicals to page A

Or:

  • Page A has a canonical in a header to A with a parameter
  • Page A has a canonical in the body to A without a parameter

If we’re providing mixed messages, then it’s not clear how Google will respond. It’s a great way to start seeing strange results.

Good places to check for the indexing directives listed above are:

  • Sitemap
    • Example: Mobile alternate tags can sit in a sitemap
  • HTTP headers
    • Example: Canonical and meta robots can be set in headers.
  • HTML head
    • This is where you’re probably looking, you’ll need this one for a comparison.
  • JavaScript-rendered vs hard-coded directives
    • You might be setting one thing in the page source and then rendering another with JavaScript, i.e. you would see something different in the HTML source from the rendered DOM.
  • Google Search Console settings
    • There are Search Console settings for ignoring parameters and country localization that can clash with indexing tags on the page.

A quick aside on rendered DOM

This page has a lot of mentions of the rendered DOM on it (18, if you’re curious). Since we’ve just had our first, here’s a quick recap about what that is.

When you load a webpage, the first request is the HTML. This is what you see in the HTML source (right-click on a webpage and click View Source).

This is before JavaScript has done anything to the page. This didn’t use to be such a big deal, but now so many websites rely heavily on JavaScript that the most people quite reasonably won’t trust the the initial HTML.

Rendered DOM is the technical term for a page, when all the JavaScript has been rendered and all the page alterations made. You can see this in Dev Tools.

In Chrome you can get that by right clicking and hitting inspect element (or Ctrl + Shift + I). The Elements tab will show the DOM as it’s being rendered. When it stops flickering and changing, then you’ve got the rendered DOM!

4. Can Google crawl the page consistently?

To see what Google is seeing, we’re going to need to get log files. At this point, we can check to see how it is accessing the page.

Aside: Working with logs is an entire post in and of itself. I’ve written a guide to log analysis with BigQuery, I’d also really recommend trying out Screaming Frog Log Analyzer, which has done a great job of handling a lot of the complexity around logs.

When we’re looking at crawling there are three useful checks we can do:

  1. Status codes: Plot the status codes over time. Is Google seeing different status codes than you when you check URLs?
  2. Resources: Is Google downloading all the resources of the page?
    1. Is it downloading all your site-specific JavaScript and CSS files that it would need to generate the page?
  3. Page size follow-up: Take the max and min of all your pages and resources and diff them. If you see a difference, then Google might be failing to fully download all the resources or pages. (Hat tip to @ohgm, where I first heard this neat tip).

Have we found any problems yet?

If Google isn’t getting 200s consistently in our log files, but we can access the page fine when we try, then there is clearly still some differences between Googlebot and ourselves. What might those differences be?

  1. It will crawl more than us
  2. It is obviously a bot, rather than a human pretending to be a bot
  3. It will crawl at different times of day

This means that:

  • If our website is doing clever bot blocking, it might be able to differentiate between us and Googlebot.
  • Because Googlebot will put more stress on our web servers, it might behave differently. When websites have a lot of bots or visitors visiting at once, they might take certain actions to help keep the website online. They might turn on more computers to power the website (this is called scaling), they might also attempt to rate-limit users who are requesting lots of pages, or serve reduced versions of pages.
  • Servers run tasks periodically; for example, a listings website might run a daily task at 01:00 to clean up all it’s old listings, which might affect server performance.

Working out what’s happening with these periodic effects is going to be fiddly; you’re probably going to need to talk to a back-end developer.

Depending on your skill level, you might not know exactly where to lead the discussion. A useful structure for a discussion is often to talk about how a request passes through your technology stack and then look at the edge cases we discussed above.

  • What happens to the servers under heavy load?
  • When do important scheduled tasks happen?

Two useful pieces of information to enter this conversation with:

  1. Depending on the regularity of the problem in the logs, it is often worth trying to re-create the problem by attempting to crawl the website with a crawler at the same speed/intensity that Google is using to see if you can find/cause the same issues. This won’t always be possible depending on the size of the site, but for some sites it will be. Being able to consistently re-create a problem is the best way to get it solved.
  2. If you can’t, however, then try to provide the exact periods of time where Googlebot was seeing the problems. This will give the developer the best chance of tying the issue to other logs to let them debug what was happening.

If Google can crawl the page consistently, then we move onto our next step.

5. Does Google see what I can see on a one-off basis?

We know Google is crawling the page correctly. The next step is to try and work out what Google is seeing on the page. If you’ve got a JavaScript-heavy website you’ve probably banged your head against this problem before, but even if you don’t this can still sometimes be an issue.

We follow the same pattern as before. First, we try to re-create it once. The following tools will let us do that:

  • Fetch & Render
    • Shows: Rendered DOM in an image, but only returns the page source HTML for you to read.
  • Mobile-friendly test
    • Shows: Rendered DOM and returns rendered DOM for you to read.
    • Not only does this show you rendered DOM, but it will also track any console errors.

Is there a difference between Fetch & Render, the mobile-friendly testing tool, and Googlebot? Not really, with the exception of timeouts (which is why we have our later steps!). Here’s the full analysis of the difference between them, if you’re interested.

Once we have the output from these, we compare them to what we ordinarily see in our browser. I’d recommend using a tool like Diff Checker to compare the two.

Have we found any problems yet?

If we encounter meaningful differences at this point, then in my experience it’s typically either from JavaScript or cookies

Why?

We can isolate each of these by:

  • Loading the page with no cookies. This can be done simply by loading the page with a fresh incognito session and comparing the rendered DOM here against the rendered DOM in our ordinary browser.
  • Use the mobile testing tool to see the page with Chrome 41 and compare against the rendered DOM we normally see with Inspect Element.

Yet again we can compare them using something like Diff Checker, which will allow us to spot any differences. You might want to use an HTML formatter to help line them up better.

We can also see the JavaScript errors thrown using the Mobile-Friendly Testing Tool, which may prove particularly useful if you’re confident in your JavaScript.

If, using this knowledge and these tools, we can recreate the bug, then we have something that can be replicated and it’s easier for us to hand off to a developer as a bug that will get fixed.

If we’re seeing everything is correct here, we move on to the next step.

6. What is Google actually seeing?

It’s possible that what Google is seeing is different from what we recreate using the tools in the previous step. Why? A couple main reasons:

  • Overloaded servers can have all sorts of strange behaviors. For example, they might be returning 200 codes, but perhaps with a default page.
  • JavaScript is rendered separately from pages being crawled and Googlebot may spend less time rendering JavaScript than a testing tool.
  • There is often a lot of caching in the creation of web pages and this can cause issues.

We’ve gotten this far without talking about time! Pages don’t get crawled instantly, and crawled pages don’t get indexed instantly.

Quick sidebar: What is caching?

Caching is often a problem if you get to this stage. Unlike JS, it’s not talked about as much in our community, so it’s worth some more explanation in case you’re not familiar. Caching is storing something so it’s available more quickly next time.

When you request a webpage, a lot of calculations happen to generate that page. If you then refreshed the page when it was done, it would be incredibly wasteful to just re-run all those same calculations. Instead, servers will often save the output and serve you the output without re-running them. Saving the output is called caching.

Why do we need to know this? Well, we’re already well out into the weeds at this point and so it’s possible that a cache is misconfigured and the wrong information is being returned to users.

There aren’t many good beginner resources on caching which go into more depth. However, I found this article on caching basics to be one of the more friendly ones. It covers some of the basic types of caching quite well.

How can we see what Google is actually working with?

  • Google’s cache
    • Shows: Source code
    • While this won’t show you the rendered DOM, it is showing you the raw HTML Googlebot actually saw when visiting the page. You’ll need to check this with JS disabled; otherwise, on opening it, your browser will run all the JS on the cached version.
  • Site searches for specific content
    • Shows: A tiny snippet of rendered content.
    • By searching for a specific phrase on a page, e.g. inurl:example.com/url “only JS rendered text”, you can see if Google has manage to index a specific snippet of content. Of course, it only works for visible text and misses a lot of the content, but it’s better than nothing!
    • Better yet, do the same thing with a rank tracker, to see if it changes over time.
  • Storing the actual rendered DOM
    • Shows: Rendered DOM
    • Alex from DeepCrawl has written about saving the rendered DOM from Googlebot. The TL;DR version: Google will render JS and post to endpoints, so we can get it to submit the JS-rendered version of a page that it sees. We can then save that, examine it, and see what went wrong.

Have we found any problems yet?

Again, once we’ve found the problem, it’s time to go and talk to a developer. The advice for this conversation is identical to the last one — everything I said there still applies.

The other knowledge you should go into this conversation armed with: how Google works and where it can struggle. While your developer will know the technical ins and outs of your website and how it’s built, they might not know much about how Google works. Together, this can help you reach the answer more quickly.

The obvious source for this are resources or presentations given by Google themselves. Of the various resources that have come out, I’ve found these two to be some of the more useful ones for giving insight into first principles:

But there is often a difference between statements Google will make and what the SEO community sees in practice. All the SEO experiments people tirelessly perform in our industry can also help shed some insight. There are far too many list here, but here are two good examples:

7. Could Google be aggregating your website across others?

If we’ve reached this point, we’re pretty happy that our website is running smoothly. But not all problems can be solved just on your website; sometimes you’ve got to look to the wider landscape and the SERPs around it.

Most commonly, what I’m looking for here is:

  • Similar/duplicate content to the pages that have the problem.
    • This could be intentional duplicate content (e.g. syndicating content) or unintentional (competitors’ scraping or accidentally indexed sites).

Either way, they’re nearly always found by doing exact searches in Google. I.e. taking a relatively specific piece of content from your page and searching for it in quotes.

Have you found any problems yet?

If you find a number of other exact copies, then it’s possible they might be causing issues.

The best description I’ve come up with for “have you found a problem here?” is: do you think Google is aggregating together similar pages and only showing one? And if it is, is it picking the wrong page?

This doesn’t just have to be on traditional Google search. You might find a version of it on Google Jobs, Google News, etc.

To give an example, if you are a reseller, you might find content isn’t ranking because there’s another, more authoritative reseller who consistently posts the same listings first.

Sometimes you’ll see this consistently and straightaway, while other times the aggregation might be changing over time. In that case, you’ll need a rank tracker for whatever Google property you’re working on to see it.

Jon Earnshaw from Pi Datametrics gave an excellent talk on the latter (around suspicious SERP flux) which is well worth watching.

Once you’ve found the problem, you’ll probably need to experiment to find out how to get around it, but the easiest factors to play with are usually:

  • De-duplication of content
  • Speed of discovery (you can often improve by putting up a 24-hour RSS feed of all the new content that appears)
  • Lowering syndication

8. A roundup of some other likely suspects

If you’ve gotten this far, then we’re sure that:

  • Google can consistently crawl our pages as intended.
  • We’re sending Google consistent signals about the status of our page.
  • Google is consistently rendering our pages as we expect.
  • Google is picking the correct page out of any duplicates that might exist on the web.

And your problem still isn’t solved?

And it is important?

Well, shoot.

Feel free to hire us…?

As much as I’d love for this article to list every SEO problem ever, that’s not really practical, so to finish off this article let’s go through two more common gotchas and principles that didn’t really fit in elsewhere before the answers to those four problems we listed at the beginning.

Invalid/poorly constructed HTML

You and Googlebot might be seeing the same HTML, but it might be invalid or wrong. Googlebot (and any crawler, for that matter) has to provide workarounds when the HTML specification isn’t followed, and those can sometimes cause strange behavior.

The easiest way to spot it is either by eye-balling the rendered DOM tools or using an HTML validator.

The W3C validator is very useful, but will throw up a lot of errors/warnings you won’t care about. The closest I can give to a one-line of summary of which ones are useful is to:

  • Look for errors
  • Ignore anything to do with attributes (won’t always apply, but is often true).

The classic example of this is breaking the head.

An iframe isn’t allowed in the head code, so Chrome will end the head and start the body. Unfortunately, it takes the title and canonical with it, because they fall after it — so Google can’t read them. The head code should have ended in a different place.

Oliver Mason wrote a good post that explains an even more subtle version of this in breaking the head quietly.

When in doubt, diff

Never underestimate the power of trying to compare two things line by line with a diff from something like Diff Checker. It won’t apply to everything, but when it does it’s powerful.

For example, if Google has suddenly stopped showing your featured markup, try to diff your page against a historical version either in your QA environment or from the Wayback Machine.


Answers to our original 4 questions

Time to answer those questions. These are all problems we’ve had clients bring to us at Distilled.

1. Why wasn’t Google showing 5-star markup on product pages?

Google was seeing both the server-rendered markup and the client-side-rendered markup; however, the server-rendered side was taking precedence.

Removing the server-rendered markup meant the 5-star markup began appearing.

2. Why wouldn’t Bing display 5-star markup on review pages, when Google would?

The problem came from the references to schema.org.

        <div itemscope="" itemtype="https://schema.org/Movie">
        </div>
        <p>  <h1 itemprop="name">Avatar</h1>
        </p>
        <p>  <span>Director: <span itemprop="director">James Cameron</span> (born August 16, 1954)</span>
        </p>
        <p>  <span itemprop="genre">Science fiction</span>
        </p>
        <p>  <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
        </p>
        <p></div>
        </p>

We diffed our markup against our competitors and the only difference was we’d referenced the HTTPS version of schema.org in our itemtype, which caused Bing to not support it.

C’mon, Bing.

3. Why were pages getting indexed with a no-index tag?

The answer for this was in this post. This was a case of breaking the head.

The developers had installed some ad-tech in the head and inserted an non-standard tag, i.e. not:

  • <title>
  • <style>
  • <base>
  • <link>
  • <meta>
  • <script>
  • <noscript>

This caused the head to end prematurely and the no-index tag was left in the body where it wasn’t read.

4. Why did any page on a website return a 302 about 20–50% of the time, but only for crawlers?

This took some time to figure out. The client had an old legacy website that has two servers, one for the blog and one for the rest of the site. This issue started occurring shortly after a migration of the blog from a subdomain (blog.client.com) to a subdirectory (client.com/blog/…).

At surface level everything was fine; if a user requested any individual page, it all looked good. A crawl of all the blog URLs to check they’d redirected was fine.

But we noticed a sharp increase of errors being flagged in Search Console, and during a routine site-wide crawl, many pages that were fine when checked manually were causing redirect loops.

We checked using Fetch and Render, but once again, the pages were fine.

Eventually, it turned out that when a non-blog page was requested very quickly after a blog page (which, realistically, only a crawler is fast enough to achieve), the request for the non-blog page would be sent to the blog server.

These would then be caught by a long-forgotten redirect rule, which 302-redirected deleted blog posts (or other duff URLs) to the root. This, in turn, was caught by a blanket HTTP to HTTPS 301 redirect rule, which would be requested from the blog server again, perpetuating the loop.

For example, requesting https://www.client.com/blog/ followed quickly enough by https://www.client.com/category/ would result in:

  • 302 to http://www.client.com – This was the rule that redirected deleted blog posts to the root
  • 301 to https://www.client.com – This was the blanket HTTPS redirect
  • 302 to http://www.client.com – The blog server doesn’t know about the HTTPS non-blog homepage and it redirects back to the HTTP version. Rinse and repeat.

This caused the periodic 302 errors and it meant we could work with their devs to fix the problem.

What are the best brainteasers you’ve had?

Let’s hear them, people. What problems have you run into? Let us know in the comments.

Also credit to @RobinLord8, @TomAnthonySEO, @THCapper, @samnemzer, and @sergeystefoglo_ for help with this piece.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Google Creates a Technical Guide for Moving to the Cloud

Google has created a guide in the form of a website for companies that are considering a move to their cloud called Google Cloud Platform for Data Center Professionals.

“We recognize that a migration of any size can be a challenging project, so today we’re happy to announce the first part of a new resource to help our customers as they migrate,” said Peter-Mark Verwoerd,a Solutions Architect at Google who previously worked for Amazon Web Services. “This is a guide for customers who are looking to move to Google Cloud Platform (GCP) and are coming from non-cloud environments.”

The guide focuses on the basics of running IT — Compute, Networking, Storage, and Management. “We’ve tried to write this from the point of view of someone with minimal cloud experience, so we hope you find this guide a useful starting point,” said Verwoerd.

The post Google Creates a Technical Guide for Moving to the Cloud appeared first on WebProNews.


WebProNews

Posted in IM NewsComments Off

How to Find and Fix 14 Technical SEO Problems That Can Be Damaging Your Site Now

Posted by Joe.Robison

Who doesn’t love working on low-hanging fruit SEO problems that can dramatically improve your site?

Across all businesses and industries, the low-effort, high-reward projects should jump to the top of the list of things to implement. And it’s nowhere more relevant than tackling technical SEO issues on your site.

Let’s focus on easy-to-identify, straightforward-to-fix problems. Most of these issues can be uncovered in an afternoon, and it’s possible they can solve months’ worth of traffic problems. While there may not be groundbreaking, complex issues that will fix SEO once and for all, there are easy things to check right now. If your site already checks out for all of these, then you can go home today and start decrypting RankBrain tomorrow.

thatwaseasy.gif

Source

Real quick: The definition of technical SEO is a bit fuzzy. Does it include everything that happens on a site except for content production? Or is it just limited to code and really technical items?

I’ll define technical SEO here as aspects of a site comprising more technical problems that the average marketer wouldn’t identify and take a bit of experience to uncover. Technical SEO problems are also generally, but not always, site-wide problems rather than specific page issues. Their fixes can help improve your site as a whole, rather than just isolated pages.

You’d think that, with all the information out there on the web, many of these would be common knowledge. I’m sure my car mechanic thought the same thing when I busted my engine because I forgot to put oil in it for months. Simple oversights can destroy your machine.

Source

The target audience for this post is beginning to intermediate SEOs and site owners that haven’t inspected their technical SEO for a while, or are doing it for the first time. If just one of these 14 technical SEO problems below is harming your site, I think you’d consider this a valuable read.

This is not a complete technical SEO audit checklist, but a summary of some of the most common and damaging technical SEO problems that you can fix now. I highlighted these based on my own real-world experience analyzing dozens of client and internal websites. Some of these issues I thought I’d never run into… until I did.

This is not a replacement for a full audit, but looking at these right now can actually save you thousands of dollars in lost sales, or worse.

1. Check indexation immediately

Have you ever heard (or asked) the question: “Why aren’t we ranking for our brand name?”

To the website owner, it’s a head-scratcher. To the seasoned SEO, it’s an eye-roll.

Can you get organic traffic to your site if it doesn’t show up in Google search? No.

I love it when complex problems are simplified at a higher level. Sergey Stefoglo at Distilled wrote an article that broke down the complex process of a technical SEO audit into two buckets: indexing and ranking.

The concept is that, instead of going crazy with a 239-point checklist with varying priorities, you sit back and ask the first question: Are the pages on our site indexing?

You can get those answers pretty quickly with a quick site search directly in Google.

What to do: Type site:{yoursitename.com} into Google search and you’ll immediately see how many pages on your site are ranking.

site-moz.png

What to ask:

  • Is that approximately the amount of pages that we’d expect to be indexing?
  • Are we seeing pages in the index that we don’t want?
  • Are we missing pages in the index that we want to rank?

What to do next:

  • Go deeper and check different buckets of pages on your site, such as product pages and blog posts
  • Check subdomains to make sure they’re indexing (or not)
  • Check old versions of your site to see if they’re mistakenly being indexed instead of redirected
  • Look out for spam in case your site was hacked, going deep into the search result to look for anything uncommon (like pharmaceutical or gambling SEO site-hacking spam)
  • Figure out exactly what’s causing indexing problems.

2. Robots.txt

Perhaps the single most damaging character in all of SEO is a simple “/” improperly placed in the robots.txt file.

Everybody knows to check the robots.txt, right? Unfortunately not.

One of the biggest offenders of ruining your site’s organic traffic is a well-meaning developer who forgot to change the robots.txt file after redeveloping your website.

You would think this would be solved by now, but I’m still repeatedly running into random sites that have their entire site blocked because of this one problem

What to do: Go to yoursitename.com/robots.txt and make sure it doesn’t show “User-agent: * Disallow: /”.

Here’s a fancy screenshot:

Screenshot 2017-01-04 17.58.30.png

And this is what it looks like in Google’s index:

2-robots-1.png

What to do next:

  • If you see “Disallow: /”, immediately talk to your developer. There could be a good reason it’s set up that way, or it may be an oversight.
  • If you have a complex robots.txt file, like many ecommerce sites, you should review it line-by-line with your developer to make sure it’s correct.

3. Meta robots NOINDEX

NOINDEX can be even more damaging than a misconfigured robots.txt at times. A mistakenly configured robots.txt won’t pull your pages out of Google’s index if they’re already there, but a NOINDEX directive will remove all pages with this configuration.

Most commonly, the NOINDEX is set up when a website is in its development phase. Since so many web development projects are running behind schedule and pushed to live at the last hour, this is where the mistake can happen.

A good developer will make sure this is removed from your live site, but you must verify that’s the case.

What to do:

  • Manually do a spot-check by viewing the source code of your page, and looking for one of these:
    4-noindex.png
  • 90% of the time you’ll want it to be either “INDEX, FOLLOW” or nothing at all. If you see one of the above, you need to take action.
  • It’s best to use a tool like Screaming Frog to scan all the pages on your site at once

What to do next:

  • If your site is constantly being updated and improved by your development team, set a reminder to check this weekly or after every new site upgrade
  • Even better, schedule site audits with an SEO auditor software tool, like the Moz Pro Site Crawl

4. One version per URL: URL Canonicalization

The average user doesn’t really care if your home page shows up as all of these separately:

But the search engines do, and this configuration can dilute link equity and make your work harder.

Google will generally decide which version to index, but they may index a mixed assortment of your URL versions, which can cause confusion and complexity.

Moz’s canonicalization guide sums it up perfectly:

For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up.”

It’s likely that no one but an SEO would flag this as something to fix, but it can be an easy fix that has a huge impact on your site.

What to do:

  • Manually enter in multiple versions of your home page in the browser to see if they all resolve to the same URL
  • Look also for HTTP vs HTTPS versions of your URLs — only one should exist
  • If they don’t, you’ll want to work with your developer to set up 301 redirects to fix this
  • Use the “site:” operator in Google search to find out which versions of your pages are actually indexing

What to do next:

  • Scan your whole site at once with a scalable tool like Screaming Frog to find all pages faster
  • Set up a schedule to monitor your URL canonicalization on a weekly or monthly basis

5. Rel=canonical

Although the rel=canonical tag is closely related with the canonicalization mentioned above, it should be noted differently because it’s used for more than resolving the same version of a slightly different URL.

It’s also useful for preventing page duplication when you have similar content across different pages — often an issue with ecommerce sites and managing categories and filters.

I think the best example of using this properly is how Shopify’s platform uses rel=canonical URLs to manage their product URLs as they relate to categories. When a product is a part of multiple categories, there are as many URLs as there are categories that product is a part of.

For example, Boll & Branch is on the Shopify platform, and on their Cable Knit Blanket product page we see that from the navigation menu, the user is taken to https://www.bollandbranch.com/collections/baby-blankets/products/cable-knit-baby-blanket.

But looking at the rel=canonical, we see it’s configured to point to the main URL:

<link  href="https://www.bollandbranch.com/products/cable-knit-baby-blanket" />

And this is the default across all Shopify sites.

Every ecommerce and CMS platform comes with a different default setting on how they handle and implement the rel=canonical tag, so definitely look at the specifics for your platform.

What to do:

  • Spot-check important pages to see if they’re using the rel=canonical tag
  • Use a site scanning software to list out all the URLs on your site and determine if there are duplicate page problems that can be solved with a rel=canonical tag
  • Read more on the different use cases for canonical tags and when best to use them

6. Text in images

Text in images — it’s such a simple concept, but out in the wild many, many sites are hiding important content behind images.

Yes, Google can somewhat understand text on images, but it’s not nearly as sophisticated as we would hope in 2017. The best practice for SEO is to keep important text not embedded in an image.

Google’s Gary Illyes confirmed that it’s unlikely Google’s crawler can recognize text well:


CognitiveSEO ran a great test on Google’s ability to extract text from images, and there’s evidence of some stunning accuracy from Google’s technology:

6-text-google-extracts-pdf.jpg

Source: Cognitive SEO

Yet, the conclusion from the test is that image-to-text extraction technology is not being used for ranking search queries:

6-text-google-doesnt-extract-search.jpg

Source: Cognitive SEO

The conclusion from CognitiveSEO is that “this search was proof that the search engine does not, in fact, extract text from images to use it in its search queries. At least not as a general rule.”

And although H1 tags are not as important as they once were, it’s still an on-site SEO best practice to prominently display.

This is actually most important for large sites with many, many pages such as massive ecommerce sites. It’s most important for these sites because they can realistically rank their product or category pages with just a simple keyword-targeted main headline and a string of text.

What to do:

  • Manually inspect the most important pages on your site, checking if you’re hiding important text in your images
  • At scale, use an SEO site crawler to scan all the pages on your site. Look for whether H1 and H2 tags are being found on pages across your site. Also look for the word count as an indication.

What to do next:

  • Create a guide for content managers and developers so that they know the best practice in your organization is to not hide text behind images
  • Collaborate with your design and development team to get the same design look that you had with text embedded in images, but using CSS instead for image overlays

7. Broken backlinks

If not properly overseen by a professional SEO, a website migration or relaunch project can spew out countless broken backlinks from other websites. This is a golden opportunity for recovering link equity.

Some of the top pages on your site may have become 404 pages after a migration, so the backlinks pointing back to these 404 pages are effectively broken.

Two types of tools are great for finding broken backlinks — Google Search Console, and a backlink checker such as Moz, Majestic, or Ahrefs.

In Search Console, you’ll want to review your top 404 errors and it will prioritize the top errors by broken backlinks:

broken-backlinks.png

What to do:

  • After identifying your top pages with backlinks that are dead, 301 redirect these to the best pages
  • Also look for broken links because the linking site typed in your URL wrong or messed up the link code on their end, this is another rich source of link opportunities

What to do next:

  • Use other tools such as Mention or Google Alerts to keep an eye on unlinked mentions that you can reach out to for an extra link
  • Set up a recurring site crawl or manual check to look out for new broken links

8. HTTPS is less optional

What was once only necessary for ecommerce sites is now becoming more of a necessity for all sites.

Google just recently announced that they would start marking any non-HTTPS site as non-secure if the site accepts passwords or credit cards:

“To help users browse the web safely, Chrome indicates connection security with an icon in the address bar. Historically, Chrome has not explicitly labelled HTTP connections as non-secure. Beginning in January 2017 (Chrome 56), we’ll mark HTTP pages that collect passwords or credit cards as non-secure, as part of a long-term plan to mark all HTTP sites as non-secure.”

What’s even more shocking is Google’s plan to label all HTTP URLs as non-secure:

“Eventually, we plan to label all HTTP pages as non-secure, and change the HTTP security indicator to the red triangle that we use for broken HTTPS.”

https-non-secure.png

Going even further, it’s not out of the realm to imagine that Google will start giving HTTPS sites even more of an algorithmic ranking benefit over HTTP.

It’s also not unfathomable that not secure site warnings will start showing up for sites directly in the search results, before a user even clicks through to the site. Google currently displays this for hacked sites, so there’s a precedent set.

This goes beyond just SEO, as this overlaps heavily with web development, IT, and conversion rate optimization.

What to do:

  • If your site currently has HTTPS deployed, run your site through Screaming Frog to see how the pages are resolving
  • Ensure that all pages are resolving to the HTTPS version of the site (same as URL canonicalization mentioned earlier)

What to do next:

  • If your site is not on HTTPS, start mapping out the transition, as Google has made it clear how important it is to them
  • Properly manage a transition to HTTPS by enlisting an SEO migration strategy so as not to lose rankings

9. 301 & 302 redirects

Redirects are an amazing tool in an SEO’s arsenal for managing and controlling dead pages, for consolidating multiple pages, and for making website migrations work without a hitch.

301 redirects are permanent and 302 redirects are temporary. The best practice is to always use 301 redirects when permanently redirecting a page.

301 redirects can be confusing for those new to SEO trying to properly use them:

  • Should you use them for all 404 errors? (Not always.)
  • Should you use them instead of the rel=canonical tag? (Sometimes, not always.)
  • Should you redirect all the old URLs from your previous site to the home page? (Almost never, it’s a terrible idea.)

They’re a lifesaver when used properly, but a pain when you have no idea what to with them.

With great power comes great responsibility, and it’s vitally important to have someone on your team who really understands how to properly strategize the usage and implementation of 301 redirects across your whole site. I’ve seen sites lose up to 60% of their revenue for months, just because these were not properly implemented during a site relaunch.

Despite some statements released recently about 302 redirects being as efficient at passing authority as 301s, it’s not advised to do so. Recent studies have tested this and shown that 301s are the gold standard. Mike King’s striking example shows that the power of 301s over 302s remains:

What to do:

  • Do a full review of all the URLs on your site and look at a high level
  • If using 302 redirects incorrectly for permanent redirects, change these to 301 redirects
  • Don’t go redirect-crazy on all 404 errors — use them for pages receiving links or traffic only to minimize your redirects list

What to do next:

  • If using 302 redirects, discuss with your development team why your site is using them
  • Build out a guide for your organization on the importance of using 301s over 302s
  • Review the redirects implementation from your last major site redesign or migration; there are often tons of errors
  • Never redirect all the pages from an old site to the home page unless there’s a really good reason
  • Include redirect checking in your monthly or weekly site scan process

10. Meta refresh

I though meta refreshes were gone for good and would never be a problem, until they were. I ran into a client using them on their brand-new, modern site when migrating from an old platform, and I quickly recommended that we turn these off and use 301 redirects instead.

The meta refresh is a client-side (as opposed to server-side) redirect and is not recommended by Google or professional SEOs.

If implemented, it would look like this:

Screenshot 2017-01-05 15.46.13.png

Source: Wikipedia

It’s a fairly simple one to check — either you have it or you don’t, and by and large there’s no debate that you shouldn’t be using these.

Google’s John Mu said:

“I would strongly recommend not using meta refresh-type or JavaScript redirects like that if you have changed your URLs. Instead of using those kinds of redirects, try to have your server do a normal 301 redirect. Search engines might recognize the JavaScript or meta refresh-type redirects, but that’s not something I would count on — a clear 301 redirect is always much better.”

And Moz’s own redirection guide states:

“They are most commonly associated with a five-second countdown with the text ‘If you are not redirected in five seconds, click here.’ Meta refreshes do pass some link juice, but are not recommended as an SEO tactic due to poor usability and the loss of link juice passed.”

What to do:

What to do next:

  • Communicate to your developers the importance of using 301 redirects as a standard and never using meta refreshes unless there’s a really good reason
  • Schedule a monthly check to monitor redirect type usage

11. XML sitemaps

XML sitemaps help Google and other search engine spiders crawl and understand your site. Most often they have the biggest impact for large and complex sites that need to give extra direction to the crawlers.

Google’s Search Console Help Guide is quite clear on the purpose and helpfulness of XML sitemaps:

“If your site’s pages are properly linked, our web crawlers can usually discover most of your site. Even so, a sitemap can improve the crawling of your site, particularly if your site meets one of the following criteria:

- Your site is really large.

- Your site has a large archive of content pages that are isolated or well not linked to each other.

- Your site is new and has few external links to it.”

A few of the biggest problems I’ve seen with XML sitemaps while working on clients’ sites:

  • Not creating it in the first place
  • Not including the location of the sitemap in the robots.txt
  • Allowing multiple versions of the sitemap to exist
  • Allowing old versions of the sitemap to exist
  • Not keeping Search Console updated with the freshest copy
  • Not using sitemap indexes for large sites

What to do:

  • Use the above list to review that you’re not violating any of these problems
  • Check the number of URLs submitted and indexed from your sitemap within Search Console to get an idea of the quality of your sitemap and URLs

What to do next:

  • Monitor indexation of URLs submitted in XML sitemap frequently from within Search Console
  • If your site grows more complex, investigate ways to use XML sitemaps and sitemap indexes to your advantage, as Google limits each sitemap to 10MB and 50,000 URLs

12. Unnatural word count & page size

I recently ran into this issue while reviewing a site: Most pages on the site didn’t have more than a few hundred words, but in a scan of the site using Screaming Frog, it showed nearly every page having 6,000–9,000 words:

Screenshot 2017-01-05 16.25.58.png

It made no sense. But upon viewing the source code, I saw that there were some Terms and Conditions text that was meant to be displayed on only a single page, but embedded on every page of the site with a “Display: none;” CSS style.

This can slow down the load speed of your page and could possibly trigger some penalty issues if seen as intentional cloaking.

In addition to word count, there can be other code bloat on the page, such as inline Javascript and CSS. Although fixing these problems would fall under the purview of the development team, you shouldn’t rely on the developers to be proactive in identifying these types of issues.

What to do:

  • Scan your site and compare calculated word count and page size with what you expect
  • Review the source code of your pages and recommend areas to reduce bloat
  • Ensure that there’s no hidden text that can trip algorithmic penalties

What to do next:

  • There could be a good reason for hidden text in the source code from a developer’s perspective, but it can cause speed and other SEO issues if not fixed.
  • Review page size and word count across all URLs on your site periodically to keep tabs on any issues

13. Speed

You’ve heard it a million times, but speed is key — and definitely falls under the purview of technical SEO.

Google has clearly stated that speed is a small part of the algorithm:

“Like us, our users place a lot of value in speed — that’s why we’ve decided to take site speed into account in our search rankings. We use a variety of sources to determine the speed of a site relative to other sites.”

Even with this clear SEO directive, and obvious UX and CRO benefits, speed is at the bottom of the priority list for many site managers. With mobile search clearly cemented as just as important as desktop search, speed is even more important and can no longer be ignored.

On his awesome Technical SEO Renaissance post, Mike King said speed is the most important thing to focus on in 2017 for SEO:

“I feel like Google believes they are in a good place with links and content so they will continue to push for speed and mobile-friendliness. So the best technical SEO tactic right now is making your site faster.”

Moz’s page speed guide is a great resource for identifying and fixing speed issues on your site.

What to do:

  • Audit your site speed and page speed using SEO auditing tools
  • Unless you’re operating a smaller site, you’ll want to work closely with your developer on this one. Make your site as fast as possible.
  • Continuously push for resources to focus on site speed across your organization.

14. Internal linking structure

Your internal linking structure can have a huge impact on your site’s crawlability from search spiders.

Where does it fall on your list of priorities? It depends. If you’re optimizing a massive site with isolated pages that don’t fall within a clean site architecture a few clicks from the home page, you’ll need to put a lot of effort into it. If you’re managing a simple site on a standard platform like WordPress, it’s not going to be at the top of your list.

You want to think about these things when building out your internal linking plan:

  • Scalable internal linking with plugins
  • Using optimized anchor text without over-optimizing
  • How internal linking relates to your main site navigation

I built out this map of a fictional site to demonstrate how different pages on a site can connect to each other through both navigational site links and internal links:

Website navigation with internal links diagram.

Source: Green Flag Digital

Even with a rock-solid site architecture, putting a focus on internal links can push some sites higher up the search rankings.

What to do:

  • Test out manually how you can move around your site by clicking on in-content, editorial-type links on your blog posts, product pages, and important site pages. Note where you see opportunity.
  • Use site auditor tools to find and organize the pages on your site by internal link count. Are your most important pages receiving sufficient internal links?

What to do next:

  • Even if you build out the perfect site architecture, there’s more opportunity for internal link flow — so always keep internal linking in mind when producing new pages
  • Train content creators and page publishers on the importance of internal linking and how to implement links effectively.

Conclusion

Here’s a newsflash for site owners: It’s very likely that your developer is not monitoring and fixing your technical SEO problems, and doesn’t really care about traffic to your site or fixing your SEO issues. So if you don’t have an SEO helping you with technical issues, don’t assume your developer is handling it. They have enough on their plate and they’re not incentivized to fix SEO problems.

I’ve run into many technical SEO issues during and after website migrations when not properly managed with SEO in mind. I’m compelled to highlight the disasters that can go wrong if this isn’t looked after closely by an expert. Case studies of site migrations gone terribly wrong is a topic for another day, but I implore you to take technical SEO seriously for the benefit of your company.

Hopefully this post has helped clarify some of the most important technical SEO issues that may be harming your site today and how to start fixing them. For those who have never taken a look at the technical side of things, some of these really are easy fixes and can have a hugely positive impact on your site.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

The Technical SEO Renaissance: The Whys and Hows of SEO’s Forgotten Role in the Mechanics of the Web

Posted by iPullRank

Web technologies and their adoption are advancing at a frenetic pace. Content is a game that every type of team and agency plays, so we’re all competing for a piece of that pie. Meanwhile, technical SEO is more complicated and more important than ever before and much of the SEO discussion has shied away from its growing technical components in favor of content marketing.

As a result, SEO is going through a renaissance wherein the technical components are coming back to the forefront and we need to be prepared. At the same time, a number of thought leaders have made statements that modern SEO is not technical. These statements misrepresent the opportunities and problems that have sprouted on the backs of newer technologies. They also contribute to an ever-growing technical knowledge gap within SEO as a marketing field and make it difficult for many SEOs to solve our new problems.

That resulting knowledge gap that’s been growing for the past couple of years influenced me to, for the first time, “tour” a presentation. I’d been giving my Technical SEO Renaissance talk in one form or another since January because I thought it was important to stoke a conversation around the fact that things have shifted and many organizations and websites may be behind the curve if they don’t account for these shifts. A number of things have happened that prove I’ve been on the right track since I began giving this presentation, so I figured it’s worth bringing the discussion to continue the discussion. Shall we?

An abridged history of SEO (according to me)

It’s interesting to think that the technical SEO has become a dying breed in recent years. There was a time when it was a prerequisite.

Image via PCMag

Personally, I started working on the web in 1995 as a high school intern at Microsoft. My title, like everyone else who worked on the web then, was “webmaster.” This was well before the web profession splintered into myriad disciplines. There was no Front End vs. Backend. There was no DevOps or UX person. You were just a Webmaster.

Back then, before Yahoo, AltaVista, Lycos, Excite, and WebCrawler entered their heyday, we discovered the web by clicking linkrolls, using Gopher, Usenet, IRC, from magazines, and via email. Around the same time, IE and Netscape were engaged in the Browser Wars and you had more than one client-side scripting language to choose from. Frames were the rage.

Then the search engines showed up. Truthfully, at this time, I didn’t really think about how search engines worked. I just knew Lycos gave me what I believed to be the most trustworthy results to my queries. At that point, I had no idea that there was this underworld of people manipulating these portals into doing their bidding.

Enter SEO.

Image via Fox

SEO was born of a cross-section of these webmasters, the subset of computer scientists that understood the otherwise esoteric field of information retrieval and those “Get Rich Quick on the Internet” folks. These Internet puppeteers were essentially magicians who traded tips and tricks in the almost dark corners of the web. They were basically nerds wringing dollars out of search engines through keyword stuffing, content spinning, and cloaking.

Then Google showed up to the party.

Image via droidforums.net

Early Google updates started the cat-and-mouse game that would shorten some perpetual vacations. To condense the last 15 years of search engine history into a short paragraph, Google changed the game from being about content pollution and link manipulation through a series of updates starting with Florida and more recently Panda and Penguin. After subsequent refinements of Panda and Penguin, the face of the SEO industry changed pretty dramatically. Many of the most arrogant “I can rank anything” SEOs turned white hat, started software companies, or cut their losses and did something else. That’s not to say that hacks and spam links don’t still work, because they certainly often do. Rather, Google’s sophistication finally discouraged a lot of people who no longer have the stomach for the roller coaster.

Simultaneously, people started to come into SEO from different disciplines. Well, people always came into SEO from very different professional histories, but it started to attract a lot more more actual “marketing” people. This makes a lot of sense because SEO as an industry has shifted heavily into a content marketing focus. After all, we’ve got to get those links somehow, right?

Image via Entrepreneur

Naturally, this begat a lot of marketers marketing to marketers about marketing who made statements like “Modern SEO Requires Almost No Technical Expertise.”

Or one of my favorites, that may have attracted even more ire: “SEO is Makeup.”

Image via Search Engine Land

While I, naturally, disagree with these statements, I understand why these folks would contribute these ideas in their thought leadership. Irrespective of the fact that I’ve worked with both gentlemen in the past in some capacity and know their predispositions towards content, the core point they’re making is that many modern Content Management Systems do account for many of our time-honored SEO best practices. Google is pretty good at understanding what you’re talking about in your content. Ultimately, your organization’s focus needs to be on making something meaningful for your user base so you can deliver competitive marketing.

If you remember the last time I tried to make the case for a paradigm shift in the SEO space, you’d be right in thinking that I agree with that idea fundamentally. However, not at the cost of ignoring the fact that the technical landscape has changed. Technical SEO is the price of admission. Or, to quote Adam Audette, “SEO should be invisible,” not makeup.

Changes in web technology are causing a technical renaissance

In SEO, we often criticize developers for always wanting to deploy the new shiny thing. Moving forward, it’s important that we understand the new shiny things so we can be more effective in optimizing them.

SEO has always had a healthy fear of JavaScript, and with good reason. Despite the fact that search engines have had the technology to crawl the web the same way we see it in a browser for at least 10 years, it has always been a crapshoot as to whether that content actually gets crawled and, more importantly, indexed.

When we’d initially examined the idea of headless browsing in 2011, the collective response was that the computational expense prohibited it at scale. But it seems that even if that is the case, Google believes enough of the web is rendered using JavaScript that it’s a worthy investment.

Over time more and more folks would examine this idea; ultimately, a comment from this ex-Googler on Hacker News would indicate that this has long been something Google understood needed conquering:

This was actually my primary role at Google from 2006 to 2010.

One of my first test cases was a certain date range of the Wall Street Journal’s archives of their Chinese language pages, where all of the actual text was in a JavaScript string literal, and before my changes, Google thought all of these pages had identical content… just the navigation boilerplate. Since the WSJ didn’t do this for its English language pages, my best guess is that they weren’t trying to hide content from search engines, but rather trying to work around some old browser bug that incorrectly rendered (or made ugly) Chinese text, but somehow rendering text via JavaScript avoided the bug.

The really interesting parts were (1) trying to make sure that rendering was deterministic (so that identical pages always looked identical to Google for duplicate elimination purposes) (2) detecting when we deviated significantly from real browser behavior (so we didn’t generate too many nonsense URLs for the crawler or too many bogus redirects), and (3) making the emulated browser look a bit like IE and Firefox (and later Chrome) at the some time, so we didn’t get tons of pages that said “come back using IE” er “please download Firefox”.

I ended up modifying SpiderMonkey’s bytecode dispatch to help detect when the simulated browser had gone off into the weeds and was likely generating nonsense.

I went through a lot of trouble figuring out the order that different JavaScript events were fired off in IE, FireFox, and Chrome. It turns out that some pages actually fire off events in different orders between a freshly loaded page and a page if you hit the refresh button. (This is when I learned about holding down shift while hitting the browser’s reload button to make it act like it was a fresh page fetch.)

At some point, some SEO figured out that random() was always returning 0.5. I’m not sure if anyone figured out that JavaScript always saw the date as sometime in the Summer of 2006, but I presume that has changed. I hope they now set the random seed and the date using a keyed cryptographic hash of all of the loaded javascript and page text, so it’s deterministic but very difficult to game. (You can make the date determistic for a month and dates of different pages jump forward at different times by adding an HMAC of page content (mod number of seconds in a month) to the current time, rounding down that time to a month boundary, and then subtracting back the value you added earlier. This prevents excessive index churn from switching all dates at once, and yet gives each page a unique date.)

Now, consider these JavaScript usage statistics across the web from BuiltWith:

JavaScript is obviously here to stay. Most of the web is using it to render content in some form or another. This means there’s potential for search quality to plummet over time if Google couldn’t make sense of what content is on pages rendered with JavaScript.

Additionally, Google’s own JavaScript MVW framework, AngularJS, has seen pretty strong adoption as of late. When I attended Google’s I/O conference a few months ago, the recent advancements of Progressive Web Apps and Firebase were being harped upon due to the speed and flexibility they bring to the web. You can only expect that developers will make a stronger push.

Image via Builtwith

Sadly, despite BuiltVisible’s fantastic contributions to the subject, there hasn’t been enough discussion around Progressive Web Apps, Single-Page Applications, and JavaScript frameworks in the SEO space. Instead, there are arguments about 301s vs 302s. Perhaps the latest spike in adoption and the proliferation of PWAs, SPAs, and JS frameworks across different verticals will change that. At iPullRank, we’ve worked with a number of companies who have made the switch to Angular; there’s a lot worth discussing on this specific topic.

Additionally, Facebook’s contribution to the JavaScript MVW frameworks, React, is being adopted for the very similar speed and benefits of flexibility in the development process.

However, regarding SEO, the key difference between Angular and React is that, from the beginning, React had a renderToString function built in which allows the content to render properly from the server side. This makes the question of indexation of React pages rather trivial.

AngularJS 1.x, on the other hand, has birthed an SEO best practice wherein you pre-render pages using headless browser-driven snapshot appliance such as Prerender.io, Brombone, etc. This is somewhat ironic, as it’s Google’s own product. More on that later.

View Source is dead

As a result of the adoption of these JavaScript frameworks, using View Source to examine the code of a website is an obsolete practice. What you’re seeing in View Source is not the computed Document Object Model (DOM). Rather, you’re seeing the code before it’s processed by the browser. The lack of understanding around why you might need to view a page’s code differently is another instance where having a more detailed understanding of the technical components of how the web works is more effective.

Depending on how the page is coded, you may see variables in the place of actual content, or you may not see the completed DOM tree that’s there once the page has loaded completely. This is the fundamental reason why, as soon as an SEO hears that there’s JavaScript on the page, the recommendation is to make sure all content is visible without JavaScript.

To illustrate the point further, consider this View Source view of Seamless.com. If you look for the meta description or the rel-canonical on this page, you’ll find variables in the place of the actual copy:

If instead you look at the code in the Elements section of Chrome DevTools or Inspect Element in other browsers, you’ll find the fully executed DOM. You’ll see the variables are now filled in with copy. The URL for the rel-canonical is on the page, as is the meta description:

Since search engines are crawling this way, you may be missing out on the complete story of what’s going on if you default to just using View Source to examine the code of the site.

HTTP/2 is on the way

One of Google’s largest points of emphasis is page speed. An understanding of how networking impacts page speed is definitely a must-have to be an effective SEO.

Before HTTP/2 was announced, the HyperText Transfer Protocol specification had not been updated in a very long time. In fact, we’ve been using HTTP/1.1 since 1999. HTTP/2 is a large departure from HTTP/1.1, and I encourage you to read up on it, as it will make a dramatic contribution to the speed of the web.

Image via Slideshare

Quickly though, one of the biggest differences is that HTTP/2 will make use of one TCP (Transmission Control Protocol) connection per origin and “multiplex” the stream. If you’ve ever taken a look at the issues that Google PageSpeed Insights highlights, you’ll notice that one of the primary things that always comes up is limiting the number of HTTP requests/ This is what multiplexing helps eliminate; HTTP/2 opens up one connection to each server, pushing assets across it at the same time, often making determinations of required resources based on the initial resource. With browsers requiring Transport Layer Security (TLS) to leverage HTTP/2, it’s very likely that Google will make some sort of push in the near future to get websites to adopt it. After all, speed and security have been common threads throughout everything in the past five years.

Image via Builtwith

As of late, more hosting providers have been highlighting the fact that they are making HTTP/2 available, which is probably why there’s been a significant jump in its usage this year. The beauty of HTTP/2 is that most browsers already support it and you don’t have to do much to enable it unless your site is not secure.

Image via CanIUse.com

Definitely keep HTTP/2 on your radar, as it may be the culmination of what Google has been pushing for.

SEO tools are lagging behind search engines

When I think critically about this, SEO tools have always lagged behind the capabilities of search engines. That’s to be expected, though, because SEO tools are built by smaller teams and the most important things must be prioritized. A lack of technical understanding may lead to you believe the information from the tools you use when they are inaccurate.

When you review some of Google’s own documentation, you’ll find that some of my favorite tools are not in line with Google’s specifications. For instance, Google allows you to specify hreflang, rel-canonical, and x-robots in HTTP headers. There’s a huge lack of consistency in SEO tools’ ability to check for those directives.

It’s possible that you’ve performed an audit of a site and found it difficult to determine why a page has fallen out of the index. It very well could be because a developer was following Google’s documentation and specifying a directive in an HTTP header, but your SEO tool did not surface it. In fact, it’s generally better to set these at the HTTP header level than to add bytes to your download time by filling up every page’s <head> with them.

Google is crawling headless, despite the computational expense, because they recognize that so much of the web is being transformed by JavaScript. Recently, Screaming Frog made the shift to render the entire page using JS:

To my knowledge, none of the other crawling tools are doing this yet. I do recognize the fact that it would be considerably more expensive for all SEO tools to make this shift because cloud server usage is time-based and it takes significantly more time to render a page in a browser than to just download the main HTML file. How much time?

A ton more time, actually. I just wrote a simple script that just loads the HTML using both cURL and HorsemanJS. cURL took an average of 5.25 milliseconds to download the HTML of the Yahoo homepage. HorsemanJS, on the other hand, took an average of 25,839.25 milliseconds or roughly 26 seconds to render the page. It’s the difference between crawling 686,000 URLs an hour and 138.

Ideally, SEO tools would extract the technologies in use on the site or perform some sort of DIFF operation on a few pages and then offer the option to crawl headless if it’s deemed worthwhile.

Finally, Google’s specs on mobile also say that you can use client-side redirects. I’m not aware of a tool that tracks this. Now, I’m not saying leveraging JavaScript redirects for mobile is the way you should do it. Rather that Google allows it, so we should be able to inspect it easily.

Luckily, until SEO tools catch up, Chrome DevTools does handle a lot of these things. For instance, the HTTP Request and Response headers section will show you x-robots, hreflang, and rel-canonical HTTP headers.

You can also use DevTools’ GeoLocation Emulator to get view the web as though you are in a different location. For those of you who have fond memories of the nearEquals query parameter, this is another way you can get a sense of where you rank in precise locations.

Chrome DevTools also allows you to plug in your Android device and control it from your browser. There’s any number of use cases for this from an SEO perspective, but Simo Ahava wrote a great instructional post on how you can use it to debug your mobile analytics setup. You can do the same on iOS devices in Safari if you have a Mac.

What truly are rankings in 2016?

Rankings are a funny thing and, truthfully, have been for some time now. I, myself, was resistant to the idea of averaged rankings when Google rolled them out in Webmaster Tools/Search Console, but average rankings actually make a lot more sense than what we look at in standard ranking tools. Let me explain.

SEO tools pull rankings based on a situation that doesn’t actually exist in the real world. The machines that scrape Google are meant to be clean and otherwise agnostic unless you explicitly specify a location. Effectively, these tools look to understand how rankings would look to users searching for the first time with no context or history with Google. Ranking software emulates a user who is logging onto the web for the first time ever and the first thing they think to do is search for “4ft fishing rod.” Then they continually search for a series of other related and/or unrelated queries without ever actually clicking on a result. Granted. some software may do other things to try and emulate that user, but either way they collect data that is not necessarily reflective of what real users see. And finally, with so many people tracking many of the same keywords so frequently, you have to wonder how much these tools inflate search volume.

The bottom line is that we are ignoring true user context, especially in the mobile arena.

Rankings tools that allow you to track mobile rankings usually let you define one context or they will simply specify “mobile phone” as an option. Cindy Krum’s research indicates that SERP features and rankings will be different based on the combination of user agent, phone make and model, browser, and even the content on their phone.

Rankings tools also ignore the user’s reality of choice. We’re in an era where there are simply so many elements that comprise the SERP, that #1 is simply NOT #1. In some cases, #1 is the 8th choice on the page and far below the fold.

With AdWords having a 4th ad slot, organic being pushed far below the fold, and users not being sure of the difference between organic and paid, being #1 in organic doesn’t mean what it used to. So when we look at rankings reports that tell us we’re number one, we’re often deluding ourselves as to what outcome that will drive. When we report that to clients, we’re not focusing on actionability or user context. Rather, we are focusing entirely on vanity.

Of course, rankings are not a business goal; they’re a measure of potential or opportunity. No matter how much we talk about how they shouldn’t be the main KPI, rankings are still something that SEOs point at to show they’re moving the needle. Therefore we should consider thinking of organic rankings as being relative to the SERP features that surround them.

In other words, I’d like to see rankings include both the standard organic 1–10 ranking as well as the absolute position with regard to Paid, local packs, and featured snippets. Anything else is ignoring the impact of the choices that are overwhelmingly available to the user.

Recently, we’ve seen some upgrades to this effect with Moz making a big change to how they are surfacing features of rankings and I know a number of other tools have highlighted the organic features as well. Who will be the first to highlight the Integrated Search context? After all, many users don’t know the difference.

What is cloaking in 2016?

Cloaking is officially defined as showing search engines something different from the user. What does that mean when Google allows adaptive and responsive sites and crawls both headless and text-based? What does that mean when Googlebot respects 304 response codes?

Under adaptive and responsive models, it’s often the case that more or less content is shown for different contexts. This is rare for responsive, as it’s meant to reposition and size content by definition, but some implementations may instead reduce content components to make the viewing context work.

In the case when a site responds to screen resolution by changing what content is shown and more content is shown beyond the resolution that Googlebot renders, how do they distinguish that from cloaking?

Similarly, the 304 response code is way to indicate to the client that the content has not been modified since the last time it visited; therefore, there’s no reason to download it again.

Googlebot adheres to this response code to keep from being a bandwidth hog. So what’s to stop a webmaster from getting one version of the page indexed, changing it, and then returning a 304?

I don’t know that there are definitive answers to those questions at this point. However, based on what I’m seeing in the wild, these have proven to be opportunities for technical SEOs that are still dedicated to testing and learning.

Crawling

Accessibility of content as a fundamental component that SEOs must examine has not changed. What has changed is the type of analytical effort that needs to go into it. It’s been established that Google’s crawling capabilities have improved dramatically and people like Eric Wu have done a great job of surfacing the granular detail of those capabilities with experiments like JSCrawlability.com

Similarly, I wanted to try an experiment to see how Googlebot behaves once it loads a page. Using LuckyOrange, I attempted to capture a video of Googlebot once it gets to the page:

I installed the LuckyOrange script on a page that hadn’t been indexed yet and set it up so that it only only fires if the user agent contains “googlebot.” Once I was set up, I then invoked Fetch and Render from Search Console. I’d hoped to see mouse scrolling or an attempt at a form fill. Instead, the cursor never moved and Googlebot was only on the page for a few seconds. Later on, I saw another hit from Googlebot to that URL and then the page appeared in the index shortly thereafter. There was no record of the second visit in LuckyOrange.

While I’d like to do more extensive testing on a bigger site to validate this finding, my hypothesis from this anecdotal experience is that Googlebot will come to the site and make a determination of whether a page/site needs to be crawled using the headless crawler. Based on that, they’ll come back to the site using the right crawler for the job.

I encourage you to give it a try as well. You don’t have to use LuckyOrange — you could use HotJar or anything else like it — but here’s my code for LuckyOrange:

jQuery(function() {
    Window.__lo_site_id = XXXX;
    if (navigator.userAgent.toLowerCase().indexOf(‘googlebot’) >)
    {
        var wa = document.createElement(‘script’);
        wa.type = ‘text/javascript’;
        wa.async = true;
        wa.src = (‘https’ == document.location.protocol ? ‘<a href="https://ssl">https://ssl</a>’ : ’<a href="http://cdn">http://cdn</a>’) + ‘.luckyorange.com/w.js’;
        var s = document.getElementByTagName(‘script’)[0];
        s.parentNode.insertBefore(wa,s);
        // Tag it with Googlebot
        window._loq = window._low || [];
        window._loq .push([“tag”, “Googlebot”]);
    }
));

The moral of the story, however, is that what Google sees, how often they see it, and so on are still primary questions that we need to answer as SEOs. While it’s not sexy, log file analysis is an absolutely necessary exercise, especially for large-site SEO projects — perhaps now more than ever, due to the complexities of sites. I’d encourage you to listen to everything Marshall Simmonds says in general, but especially on this subject.

To that end, Google’s Crawl Stats in Search Console are utterly useless. These charts tell me what, exactly? Great, thanks Google, you crawled a bunch of pages at some point in February. Cool!

There are any number of log file analysis tools out there, from Kibana in the ELK stack to other tools such as Logz.io. However, the Screaming Frog team has made leaps and bounds in this arena with the recent release of their Log File Analyzer.

Of note with this tool is how easily it handles millions of records, which I hope is an indication of things to come with their Spider tool as well. Irrespective of who makes the tool, the insights that it helps you unlock are incredibly valuable in terms of what’s actually happening.

We had a client last year that was adamant that their losses in organic were not the result of the Penguin update. They believed that it might be due to turning off other traditional and digital campaigns that may have contributed to search volume, or perhaps seasonality or some other factor. Pulling the log files, I was able to layer all of the data from when all of their campaigns were running and show that it was none of those things; rather, Googlebot activity dropped tremendously right after the Penguin update and at the same time as their organic search traffic. The log files made it definitively obvious.

It follows conventionally held SEO wisdom that Googlebot crawls based on the pages that have the highest quality and/or quantity of links pointing to them. In layering the the number of social shares, links, and Googlebot visits for our latest clients, we’re finding that there’s more correlation between social shares and crawl activity than links. In the data below, the section of the site with the most links actually gets crawled the least!

These are important insights that you may just be guessing at without taking the time to dig into your log files.

How log files help you understand AngularJS

Like any other web page or application, every request results in a record in the logs. But depending on how the server is setup, there are a ton of lessons that can come out of it with regard to AngularJS setups, especially if you’re pre-rendering using one of the snapshot technologies.

For one of our clients, we found that oftentimes when the snapshot system needed to refresh its cache, it took too long and timed out. Googlebot understands these as 5XX errors.

This behavior leads to those pages falling out of the index, and over time we saw pages jump back and forth between ranking very highly and disappearing altogether, or another page on the site taking its place.

Additionally, we found that there were many instances wherein Googlebot was being misidentified as a human user. In turn, Googlebot was served the AngularJS live page rather than the HTML snapshot. However, despite the fact that Googlebot was not seeing the HTML snapshots for these pages, these pages were still making it into the index and ranking just fine. So we ended up working with the client on a test to remove the snapshot system on sections of the site, and organic search traffic actually improved.

This is directly in line with what Google is saying in their deprecation announcement of the AJAX Crawling scheme. They are able to access content that is rendered using JavaScript and will index anything that is shown at load.

That’s not to say that HTML snapshot systems are not worth using. The Googlebot behavior for pre-rendered pages is that they tend to be crawled more quickly and more frequently. My best guess is that this is due to the crawl being less computationally expensive for them to execute. All in all, I’d say using HTML snapshots is still the best practice, but definitely not the only way for Google see these types of sites.

According to Google, you shouldn’t serve snapshots just for them, but for the speed enhancements that the user gets as well.

In general, websites shouldn’t pre-render pages only for Google — we expect that you might pre-render pages for performance benefits for users and that you would follow progressive enhancement guidelines. If you pre-render pages, make sure that the content served to Googlebot matches the user’s experience, both how it looks and how it interacts. Serving Googlebot different content than a normal user would see is considered cloaking, and would be against our Webmaster Guidelines.

These are highly technical decisions that have a direct influence on organic search visibility. From my experience in interviewing SEOs to join our team at iPullRank over the last year, very few of them understand these concepts or are capable of diagnosing issues with HTML snapshots. These issues are now commonplace and will only continue to grow as these technologies continue to be adopted.

However, if we’re to serve snapshots to the user too, it begs the question: Why would we use the framework in the first place? Naturally, tech stack decisions are ones that are beyond the scope of just SEO, but you might consider a framework that doesn’t require such an appliance, like MeteorJS.

Alternatively, if you definitely want to stick with Angular, consider Angular 2, which supports the new Angular Universal. Angular Universal serves “isomorphic” JavaScript, which is another way to say that it pre-renders its content on the server side.

Angular 2 has a whole host of improvements over Angular 1.x, but I’ll let these Googlers tell you about them.

Before all of the crazy frameworks reared their confusing heads, Google has had one line of thought about emerging technologies — and that is “progressive enhancement.” With many new IoT devices on the horizon, we should be building websites to serve content for the lowest common denominator of functionality and save the bells and whistles for the devices that can render them.

If you’re starting from scratch, a good approach is to build your site’s structure and navigation using only HTML. Then, once you have the site’s pages, links, and content in place, you can spice up the appearance and interface with AJAX. Googlebot will be happy looking at the HTML, while users with modern browsers can enjoy your AJAX bonuses.

In other words, make sure your content is accessible to everyone. Shoutout to Fili Weise for reminding me of that.

Scraping is the fundamental flawed core of SEO analysis

Scraping is fundamental to everything that our SEO tools do. cURL is a library for making and handling HTTP requests. Most popular programming languages have bindings for the library and, as such, most SEO tools leverage the library or something similar to download web pages.

Think of cURL as working similar to downloading a single file from an FTP; in terms of web pages, it doesn’t mean that the page can be viewed in its entirety, because you’re not downloading all of the required files.

This is a fundamental flaw of most SEO software for the very same reason View Source is not a valuable way to view a page’s code anymore. Because there are a number of JavaScript and/or CSS transformations that happen at load, and Google is crawling with headless browsers, you need to look at the Inspect (element) view of the code to get a sense of what Google can actually see.

This is where headless browsing comes into play.

One of the more popular headless browsing libraries is PhantomJS. Many tools outside of the SEO world are written using this library for browser automation. Netflix even has one for scraping and taking screenshots called Sketchy. PhantomJS is built from a rendering engine called QtWebkit, which is to say it’s forked from the same code that Safari (and Chrome before Google forked it into Blink) is based on. While PhantomJS is missing the features of the latest browsers, it has enough features to support most things we need for SEO analysis.

As you can see from the GitHub repository, HTML snapshot software such as Prerender.io is written using this library as well.

PhantomJS has a series of wrapper libraries that make it quite easy to use in a variety of different languages. For those of you interested in using it with NodeJS, check out HorsemanJS.

For those of you that are more familiar with PHP, check out PHP PhantomJS.

A more recent and better qualified addition to the headless browser party is Headless Chromium. As you might have guessed, this is a headless version of the Chrome browser. If I were a betting man, I’d say what we’re looking at here is a some sort of toned-down fork of Googlebot.

To that end, this is probably something that SEO companies should consider when rethinking their own crawling infrastructure in the future, if only for a premium tier of users. If you want to know more about Headless Chrome, check out what Sami Kyostila and Alex Clarke (both Googlers) had to say at BlinkOn 6:

Using in-browser scraping to do what your tools can’t

Although many SEO tools cannot examine the fully rendered DOM, that doesn’t mean that you, as an an individual SEO, have to miss out. Even without leveraging a headless browser, Chrome can be turned into a scraping machine with just a little bit of JavaScript. I’ve talked about this at length in my “How to Scrape Every Single Page on the Web” post. Using a little bit of jQuery, you can effectively select and print anything from a page to the JavaScript Console and then export it to a file in whatever structure you prefer.

Scraping this way allows you to skip a lot of the coding that’s required to make sites believe you’re a real user, like authentication and cookie management that has to happen on the server side. Of course, this way of scraping is good for one-offs rather than building software around.

ArtooJS is a bookmarklet made to support in-browser scraping and automating scraping across a series of pages and saving the results to a file as JSON.

A more fully featured solution for this is the Chrome Extension, WebScraper.io. It requires no code and makes the whole process point-and-click.

How to approach content and linking from the technical context

Much of what SEO has been doing for the past few years has devolved into the creation of more content for more links. I don’t know that adding anything to the discussion around how to scale content or build more links is of value at this point, but I suspect there are some opportunities for existing links and content that are not top-of-mind for many people.

Google Looks at Entities First

Googlers announced recently that they look at entities first when reviewing a query. An entity is Google’s representation of proper nouns in their system to distinguish persons, places, and things, and inform their understanding of natural language. At this point in the talk, I ask people to put their hands up if they have an entity strategy. I’ve given the talk a dozen times at this point and there have only been two people to raise their hands.

Bill Slawski is the foremost thought leader on this topic, so I’m going to defer to his wisdom and encourage you to read:

I would also encourage you to use a natural language processing tool like AlchemyAPI or MonkeyLearn. Better still, use Google’s own Natural Language Processing API to extract entities. The difference between your standard keyword research and entity strategies is that your entity strategy needs to be built from your existing content. So in identifying entities, you’ll want to do your keyword research first and then run those landing pages through an entity extraction tool to see how they line up. You’ll also want to run your competitor landing pages through those same entity extraction APIs to identify what entities are being targeted for those keywords.

TF*IDF

Similarly, Term Frequency/Inverse Document Frequency or TF*IDF is a natural language processing technique that doesn’t get much discussion on this side of the pond. In fact, topic modeling algorithms have been the subject of much-heated debates in the SEO community in the past. The issue of concern is that topic modeling tools have the tendency to push us back towards the Dark Ages of keyword density, rather than considering the idea of creating content that has utility for users. However, in many European countries they swear by TF*IDF (or WDF*IDF — Within Document Frequency/Inverse Document Frequency) as a key technique that drives up organic visibility even without links.

After hanging out in Germany a bit last year, some folks were able to convince me that taking another look at TF*IDF was worth it. So, we did and then we started working it into our content optimization process.

In Searchmetrics’ 2014 study of ranking factors they found that while TF*IDF specifically actually had a negative correlation with visibility, relevant and proof terms have strong positive correlations.

Image via Searchmetrics

Based on their examination of these factors, Searchmetrics made the call to drop TF*IDF from their analysis altogether in 2015 in favor of the proof terms and relevant terms. Year over year the positive correlation holds for those types of terms, albeit not as high.

Images via Searchmetrics

In Moz’s own 2015 ranking factors, we find that LDA and TF*IDF related items remain in the highest on-page content factors.

In effect, no matter what model you look at, the general idea is to use related keywords in your copy in order to rank better for your primary target keyword, because it works.

Now, I can’t say we’ve examined the tactic in isolation, but I can say that the pages that we’ve optimized using TF*IDF have seen bigger jumps in rankings than those without it. While we leverage OnPage.org’s TF*IDF tool, we don’t follow it using hard and fast numerical rules. Instead, we allow the related keywords to influence ideation and then use them as they make sense.

At the very least, this order of technical optimization of content needs to revisited. While you’re at it, you should consider the other tactics that Cyrus Shepard called out as well in order to get more mileage out of your content marketing efforts.

302s vs 301s — seriously?

As of late, a reexamination of the 301 vs. 302 redirect has come back up in the SEO echo chamber. I get the sense that Webmaster Trends Analysts in the public eye either like attention or are just bored, so they’ll issue vague tweets just to see what happens.

For those of you who prefer to do work rather than wait for Gary Illyes to tweet, all I’ve got is some data to share.

Once upon a time, we worked with a large media organization. As is par for the course with these types of organizations, their tech team was resistant to implementing much of our recommendations. Yet they had millions of links both internally and externally pointing to URLs that returned 302 response codes.

After many meetings, and a more compelling business case, the one substantial thing that we were able to convince them to do was switch those 302s into 301s. Nearly overnight there was an increase in rankings in the 1–3 rank zone.

Despite seasonality, there was a jump in organic Search traffic as well.

To reiterate, the only substantial change at this point was the 302 to 301 switch. It resulted in a few million more organic search visits month over month. Granted, this was a year ago, but until someone can show me the same happening or no traffic loss when you switch from 301s to 302s, there’s no discussion for us to have.

Internal linking, the technical approach

Under the PageRank model, it’s an axiom that the flow of link equity through the site is an incredibly important component to examine. Unfortunately, so much of the discussion with clients is only on the external links and not about how to better maximize the link equity that a site already has.

There are a number of tools out there that bring this concept to the forefront. For instance, Searchmetrics calculates and visualizes the flow of link equity throughout the site. This gives you a sense of where you can build internal links to make other pages stronger.

Additionally, Paul Shapiro put together a compelling post on how you can calculate a version of internal PageRank for free using the statistical computing software R.

Either of these approaches is incredibly valuable to offering more visibility to content and very much fall in the bucket of what technical SEO can offer.

Structured data is the future of organic search

The popular one-liner is that Google is looking to become the presentation layer of the web. I say, help them do it!

There has been much discussion about how Google is taking our content and attempting to cut our own websites out of the picture. With the traffic boon that the industry has seen from sites making it into the featured snippet, it’s pretty obvious that, in many cases, there’s more value for you in Google taking your content than in them not.

With Vocal Search appliances on mobile devices and the forthcoming Google Home, there’s only one answer that the user receives. That is to say that the Star Trek computer Google is building is not going to read every result — just one. These answers are fueled by rich cards and featured snippets, which are in turn fueled by structured data.

Google has actually done us a huge favor regarding structured data in updating the specifications that allow JSON-LD. Before this, Schema.org was a matter of making very tedious and specific changes to code with little ROI. Now structured data powers a number of components of the SERP and can simply be placed at the <HEAD> of a document quite easily. Now is the time to revisit implementing the extra markup. Builtvisible’s guide to Structured Data remains the gold standard.

Page speed is still Google’s obsession

Google has very aggressive expectations around page speed, especially for the mobile context. They want the above-the-fold content to load within one second. However, 800 milliseconds of that time is pretty much out of your control.

Image via Google

Based on what you can directly affect, as an SEO, you have 200 milliseconds to make content appear on the screen. A lot of what can be done on-page to influence the speed at which things load is optimizing the page for critical rendering path.

Image via Nianpeng Li

To understand this concept, first we have to take a bit of a step back to get a sense of how browsers construct a web page.

  1. The browser takes the uniform resource locator (URL) that you specify in your address bar and performs a DNS lookup on the domain name.
  2. Once a socket is open and a connection is negotiated, it then asks the server for the HTML of the page you’ve requested.
  3. The browser begins to parse the HTML into the Document Object Model until it encounters CSS, then it starts to parse the CSS into the CSS Object Model.
  4. If at any point it runs into JavaScript, it will pause the DOM and/or CSSOM construction until the JavaScript completes execution, unless it is asynchronous.
  5. Once all of this is complete, the browser constructs the Render Tree, which then builds the layout of the page and finally the elements of the page are painted.

In the Timeline section of Chrome DevTools, you can see the individual operations as they happen and how they contribute to load time. In the timeline at the top, you’ll always see the visualization as mostly yellow because JavaScript execution takes the most time out of any part of page construction. JavaScript causes page construction to halt until the the script execution is complete. This is called “render-blocking” JavaScript.

That term may sound familiar to you because you’ve poked around in PageSpeed Insights looking for answers on how to make improvements and “Eliminate Render-blocking JavaScript” is a common one. The tool is primarily built to support optimization for the Critical Rendering Path. A lot of the recommendations involve issues like sizing resources statically, using asynchronous scripts, and specifying image dimensions.

Additionally, external resources contribute significantly to page load time. For instance, I always see Chartbeat’s library taking 3 or more seconds just to resolve the DNS. These are all things that need to be reviewed when considering how to make a page load faster.

If you know much about the Accelerated Mobile Pages (AMP) specification, a lot of what I just highlighted might sound very familiar to you.

Essentially, AMP exists because Google believes the general public is bad at coding. So they made a subset of HTML and threw a global CDN behind it to make your pages hit the 1 second mark. Personally, I have a strong aversion to AMP, but as many of us predicted at the top of the year, Google has rolled AMP out beyond just the media vertical and into all types of pages in the SERP. The roadmap indicates that there is a lot more coming, so it’s definitely something we should dig into and look to capitalize on.

Using pre-browsing directives to speed things up

To support site speed improvements, most browsers have pre-browsing resource hints. These hints allow you to indicate to the browser that a file will be needed later in the page, so while the components of the browser are idle, it can download or connect to those resources now. Chrome specifically looks to do these things automatically when it can, and may ignore your specification altogether. However, these directives operate much like the rel-canonical tag — you’re more likely to get value out of them than not.

Image via Google

  • Rel-preconnect – This directive allows you to resolve the DNS, initiate the TCP handshake, and negotiate the TLS tunnel between the client and server before you need to. When you don’t do this, these things happen one after another for each resource rather than simultaneously. As the diagram below indicates, in some cases you can shave nearly half a second off just by doing this. Alternatively, if you just want to resolve the DNS in advance, you could use rel-dns-prefetch.

    If you see a lot of idle time in your Timeline in Chrome DevTools, rel-preconnect can help you shave some of that off.

    You can specify rel-preconnect with

    <link rel=”preconnect” href=”https://domain.com”>
    	

    or rel-dns-prefetch with

    <link rel=”dns-prefetch” href=”domain.com”>
    	

  • Rel-prefetch – This directive allows you to download a resource for a page that will be needed in the future. For instance, if you want to pull the stylesheet of the next page or download the HTML for the next page, you can do so by specifying it as
    <link rel=”prefetch” href=”nextpage.html”>
    	
  • Rel-prerender – Not to be confused with the aforementioned Prerender.io, rel-prerender is a directive that allows you to load an entire page and all of its resources in an invisible tab. Once the user clicks a link to go to that URL, the page appears instantly. If the user instead clicks on a link that you did not specify as the rel-prerender, the prerendered page is deleted from memory. You specify the rel-prerender as follows:
    <link rel=”prerender” href=”nextpage.html”>
    	

    I’ve talked about rel-prerender in the past in my post about how I improved our site’s speed 68.35% with one line of code.

    There are a number of caveats that come with rel-prerender, but the most important one is that you can only specify one page at a time and only one rel-prerender can be specified across all Chrome threads. In my post I talk about how to leverage the Google Analytics API to make the best guess at the URL the user is likely going to visit next.

    If you’re using an analytics package that isn’t Google Analytics, or if you have ads on your pages, it will falsely count prerender hits as actual views to the page. What you’ll want to do is wrap any JavaScript that you don’t want to fire until the page is actually in view in the Page Visibility API. Effectively, you’ll only fire analytics or show ads when the page is actually visible.

    Finally, keep in mind that rel-prerender does not work with Firefox, iOS Safari, Opera Mini, or Android’s browser. Not sure why they didn’t get invited to the pre-party, but I wouldn’t recommend using it on a mobile device anyway.

  • Rel-preload and rel-subresource – Following the same pattern as above, rel-preload and rel-subresource allow you to load things within the same page before they are needed. Rel-subresource is Chrome-specific, while rel-preload works for Chrome, Android, and Opera.

Finally, keep in mind that Chrome is sophisticated enough to make attempts at all of these things. Your resource hints help them develop the 100% confidence level to act on them. Chrome is making a series of predictions based on everything you type into the address bar and it keeps track of whether or not it’s making the right predictions to determine what to preconnect and prerender for you. Check out chrome://predictors to see what Chrome has been predicting based on your behavior.

Image via Google

Where does SEO go from here?

Being a strong SEO requires a series of skills that’s difficult for a single person to be great at. For instance, an SEO with strong technical skills may find it difficult to perform effective outreach or vice-versa. Naturally, SEO is already stratified between on- and off-page in that way. However, the technical skill requirement has continued to grow dramatically in the past few years.

There are a number of skills that have always given technical SEOs an unfair advantage, such as web and software development skills or even statistical modeling skills. Perhaps it’s time to officially further stratify technical SEO from traditional content-driven on-page optimizations, since much of the skillset required is more that of a web developer and network administrator than that of what is typically thought of as SEO (at least at this stage in the game). As an industry, we should consider a role of an SEO Engineer, as some organizations already have.

At the very least, the SEO Engineer will need to have a grasp of all of the following to truly capitalize on these technical opportunities:

  • Document Object Model – An understanding of the building blocks of web browsers is fundamental to the understanding how how we front-end developers manipulate the web as they build it.
  • Critical Rendering Path – An understanding of how a browser constructs a page and what goes into the rendering of the page will help with the speed enhancements that Google is more aggressively requiring.
  • Structured Data and Markup – An understanding of how metadata can be specified to influence how Google understands the information being presented.
  • Page Speed – An understanding of the rest of the coding and networking components that impact page load times is the natural next step to getting page speed up. Of course, this is a much bigger deal than SEO, as it impacts the general user experience.
  • Log File Analysis – An understanding of how search engines traverse websites and what they deem as important and accessible is a requirement, especially with the advent of new front-end technologies.
  • SEO for JavaScript Frameworks – An understanding of the implications of leveraging one of the popular frameworks for front-end development, as well as a detailed understanding of how, why, and when an HTML snapshot appliance may be required and what it takes to implement them is critical. Just the other day, Justin Briggs collected most of the knowledge on this topic in one place and broke it down to its components. I encourage you to check it out.
  • Chrome DevTools – An understanding of one of the most the powerful tools in the SEO toolkit, the Chrome web browser itself. Chrome DevTools’ features coupled with a few third-party plugins close the gaps for many things that SEO tools cannot currently analyze. The SEO Engineer needs to be able to build something quick to get the answers to questions that were previously unasked by our industry.
  • Acclerated Mobile Pages & Facebook Instant Pages – If the AMP Roadmap is any indication, Facebook Instant Pages is a similar specification and I suspect it will be difficult for them to continue to exist exclusively.
  • HTTP/2 – An understanding of how this protocol will dramatically change the speed of the web and the SEO implications of migrating from HTTP/1.1.

Let’s Make SEO Great Again

One of the things that always made SEO interesting and its thought leaders so compelling was that we tested, learned, and shared that knowledge so heavily. It seems that that culture of testing and learning was drowned in the content deluge. Perhaps many of those types of folks disappeared as the tactics they knew and loved were swallowed by Google’s zoo animals. Perhaps our continually eroding data makes it more and more difficult to draw strong conclusions.

Whatever the case, right now, there are far fewer people publicly testing and discovering opportunities. We need to demand more from our industry, our tools, our clients, our agencies, and ourselves.

Let’s stop chasing the content train and get back to making experiences that perform.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

SearchCap: Apple news, successful PPC managers & technical SEO

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Apple news, successful PPC managers & technical SEO appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Why Effective, Modern SEO Requires Technical, Creative, and Strategic Thinking – Whiteboard Friday

Posted by randfish

There’s no doubt that quite a bit has changed about SEO, and that the field is far more integrated with other aspects of online marketing than it once was. In today’s Whiteboard Friday, Rand pushes back against the idea that effective modern SEO doesn’t require any technical expertise, outlining a fantastic list of technical elements that today’s SEOs need to know about in order to be truly effective.

Why Effective, Modern SEO Requires Technical, Creative, and Strategic Thinking - Whiteboard Friday

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week I’m going to do something unusual. I don’t usually point out these inconsistencies or sort of take issue with other folks’ content on the web, because I generally find that that’s not all that valuable and useful. But I’m going to make an exception here.

There is an article by Jayson DeMers, who I think might actually be here in Seattle — maybe he and I can hang out at some point — called “Why Modern SEO Requires Almost No Technical Expertise.” It was an article that got a shocking amount of traction and attention. On Facebook, it has thousands of shares. On LinkedIn, it did really well. On Twitter, it got a bunch of attention.

Some folks in the SEO world have already pointed out some issues around this. But because of the increasing popularity of this article, and because I think there’s, like, this hopefulness from worlds outside of kind of the hardcore SEO world that are looking to this piece and going, “Look, this is great. We don’t have to be technical. We don’t have to worry about technical things in order to do SEO.”

Look, I completely get the appeal of that. I did want to point out some of the reasons why this is not so accurate. At the same time, I don’t want to rain on Jayson, because I think that it’s very possible he’s writing an article for Entrepreneur, maybe he has sort of a commitment to them. Maybe he had no idea that this article was going to spark so much attention and investment. He does make some good points. I think it’s just really the title and then some of the messages inside there that I take strong issue with, and so I wanted to bring those up.

First off, some of the good points he did bring up.

One, he wisely says, “You don’t need to know how to code or to write and read algorithms in order to do SEO.” I totally agree with that. If today you’re looking at SEO and you’re thinking, “Well, am I going to get more into this subject? Am I going to try investing in SEO? But I don’t even know HTML and CSS yet.”

Those are good skills to have, and they will help you in SEO, but you don’t need them. Jayson’s totally right. You don’t have to have them, and you can learn and pick up some of these things, and do searches, watch some Whiteboard Fridays, check out some guides, and pick up a lot of that stuff later on as you need it in your career. SEO doesn’t have that hard requirement.

And secondly, he makes an intelligent point that we’ve made many times here at Moz, which is that, broadly speaking, a better user experience is well correlated with better rankings.

You make a great website that delivers great user experience, that provides the answers to searchers’ questions and gives them extraordinarily good content, way better than what’s out there already in the search results, generally speaking you’re going to see happy searchers, and that’s going to lead to higher rankings.

But not entirely. There are a lot of other elements that go in here. So I’ll bring up some frustrating points around the piece as well.

First off, there’s no acknowledgment — and I find this a little disturbing — that the ability to read and write code, or even HTML and CSS, which I think are the basic place to start, is helpful or can take your SEO efforts to the next level. I think both of those things are true.

So being able to look at a web page, view source on it, or pull up Firebug in Firefox or something and diagnose what’s going on and then go, “Oh, that’s why Google is not able to see this content. That’s why we’re not ranking for this keyword or term, or why even when I enter this exact sentence in quotes into Google, which is on our page, this is why it’s not bringing it up. It’s because it’s loading it after the page from a remote file that Google can’t access.” These are technical things, and being able to see how that code is built, how it’s structured, and what’s going on there, very, very helpful.

Some coding knowledge also can take your SEO efforts even further. I mean, so many times, SEOs are stymied by the conversations that we have with our programmers and our developers and the technical staff on our teams. When we can have those conversations intelligently, because at least we understand the principles of how an if-then statement works, or what software engineering best practices are being used, or they can upload something into a GitHub repository, and we can take a look at it there, that kind of stuff is really helpful.

Secondly, I don’t like that the article overly reduces all of this information that we have about what we’ve learned about Google. So he mentions two sources. One is things that Google tells us, and others are SEO experiments. I think both of those are true. Although I’d add that there’s sort of a sixth sense of knowledge that we gain over time from looking at many, many search results and kind of having this feel for why things rank, and what might be wrong with a site, and getting really good at that using tools and data as well. There are people who can look at Open Site Explorer and then go, “Aha, I bet this is going to happen.” They can look, and 90% of the time they’re right.

So he boils this down to, one, write quality content, and two, reduce your bounce rate. Neither of those things are wrong. You should write quality content, although I’d argue there are lots of other forms of quality content that aren’t necessarily written — video, images and graphics, podcasts, lots of other stuff.

And secondly, that just doing those two things is not always enough. So you can see, like many, many folks look and go, “I have quality content. It has a low bounce rate. How come I don’t rank better?” Well, your competitors, they’re also going to have quality content with a low bounce rate. That’s not a very high bar.

Also, frustratingly, this really gets in my craw. I don’t think “write quality content” means anything. You tell me. When you hear that, to me that is a totally non-actionable, non-useful phrase that’s a piece of advice that is so generic as to be discardable. So I really wish that there was more substance behind that.

The article also makes, in my opinion, the totally inaccurate claim that modern SEO really is reduced to “the happier your users are when they visit your site, the higher you’re going to rank.”

Wow. Okay. Again, I think broadly these things are correlated. User happiness and rank is broadly correlated, but it’s not a one to one. This is not like a, “Oh, well, that’s a 1.0 correlation.”

I would guess that the correlation is probably closer to like the page authority range. I bet it’s like 0.35 or something correlation. If you were to actually measure this broadly across the web and say like, “Hey, were you happier with result one, two, three, four, or five,” the ordering would not be perfect at all. It probably wouldn’t even be close.

There’s a ton of reasons why sometimes someone who ranks on Page 2 or Page 3 or doesn’t rank at all for a query is doing a better piece of content than the person who does rank well or ranks on Page 1, Position 1.

Then the article suggests five and sort of a half steps to successful modern SEO, which I think is a really incomplete list. So Jayson gives us;

  • Good on-site experience
  • Writing good content
  • Getting others to acknowledge you as an authority
  • Rising in social popularity
  • Earning local relevance
  • Dealing with modern CMS systems (which he notes most modern CMS systems are SEO-friendly)

The thing is there’s nothing actually wrong with any of these. They’re all, generally speaking, correct, either directly or indirectly related to SEO. The one about local relevance, I have some issue with, because he doesn’t note that there’s a separate algorithm for sort of how local SEO is done and how Google ranks local sites in maps and in their local search results. Also not noted is that rising in social popularity won’t necessarily directly help your SEO, although it can have indirect and positive benefits.

I feel like this list is super incomplete. Okay, I brainstormed just off the top of my head in the 10 minutes before we filmed this video a list. The list was so long that, as you can see, I filled up the whole whiteboard and then didn’t have any more room. I’m not going to bother to erase and go try and be absolutely complete.

But there’s a huge, huge number of things that are important, critically important for technical SEO. If you don’t know how to do these things, you are sunk in many cases. You can’t be an effective SEO analyst, or consultant, or in-house team member, because you simply can’t diagnose the potential problems, rectify those potential problems, identify strategies that your competitors are using, be able to diagnose a traffic gain or loss. You have to have these skills in order to do that.

I’ll run through these quickly, but really the idea is just that this list is so huge and so long that I think it’s very, very, very wrong to say technical SEO is behind us. I almost feel like the opposite is true.

We have to be able to understand things like;

  • Content rendering and indexability
  • Crawl structure, internal links, JavaScript, Ajax. If something’s post-loading after the page and Google’s not able to index it, or there are links that are accessible via JavaScript or Ajax, maybe Google can’t necessarily see those or isn’t crawling them as effectively, or is crawling them, but isn’t assigning them as much link weight as they might be assigning other stuff, and you’ve made it tough to link to them externally, and so they can’t crawl it.
  • Disabling crawling and/or indexing of thin or incomplete or non-search-targeted content. We have a bunch of search results pages. Should we use rel=prev/next? Should we robots.txt those out? Should we disallow from crawling with meta robots? Should we rel=canonical them to other pages? Should we exclude them via the protocols inside Google Webmaster Tools, which is now Google Search Console?
  • Managing redirects, domain migrations, content updates. A new piece of content comes out, replacing an old piece of content, what do we do with that old piece of content? What’s the best practice? It varies by different things. We have a whole Whiteboard Friday about the different things that you could do with that. What about a big redirect or a domain migration? You buy another company and you’re redirecting their site to your site. You have to understand things about subdomain structures versus subfolders, which, again, we’ve done another Whiteboard Friday about that.
  • Proper error codes, downtime procedures, and not found pages. If your 404 pages turn out to all be 200 pages, well, now you’ve made a big error there, and Google could be crawling tons of 404 pages that they think are real pages, because you’ve made it a status code 200, or you’ve used a 404 code when you should have used a 410, which is a permanently removed, to be able to get it completely out of the indexes, as opposed to having Google revisit it and keep it in the index.

Downtime procedures. So there’s specifically a… I can’t even remember. It’s a 5xx code that you can use. Maybe it was a 503 or something that you can use that’s like, “Revisit later. We’re having some downtime right now.” Google urges you to use that specific code rather than using a 404, which tells them, “This page is now an error.”

Disney had that problem a while ago, if you guys remember, where they 404ed all their pages during an hour of downtime, and then their homepage, when you searched for Disney World, was, like, “Not found.” Oh, jeez, Disney World, not so good.

  • International and multi-language targeting issues. I won’t go into that. But you have to know the protocols there. Duplicate content, syndication, scrapers. How do we handle all that? Somebody else wants to take our content, put it on their site, what should we do? Someone’s scraping our content. What can we do? We have duplicate content on our own site. What should we do?
  • Diagnosing traffic drops via analytics and metrics. Being able to look at a rankings report, being able to look at analytics connecting those up and trying to see: Why did we go up or down? Did we have less pages being indexed, more pages being indexed, more pages getting traffic less, more keywords less?
  • Understanding advanced search parameters. Today, just today, I was checking out the related parameter in Google, which is fascinating for most sites. Well, for Moz, weirdly, related:oursite.com shows nothing. But for virtually every other sit, well, most other sites on the web, it does show some really interesting data, and you can see how Google is connecting up, essentially, intentions and topics from different sites and pages, which can be fascinating, could expose opportunities for links, could expose understanding of how they view your site versus your competition or who they think your competition is.

Then there are tons of parameters, like in URL and in anchor, and da, da, da, da. In anchor doesn’t work anymore, never mind about that one.

I have to go faster, because we’re just going to run out of these. Like, come on. Interpreting and leveraging data in Google Search Console. If you don’t know how to use that, Google could be telling you, you have all sorts of errors, and you don’t know what they are.

  • Leveraging topic modeling and extraction. Using all these cool tools that are coming out for better keyword research and better on-page targeting. I talked about a couple of those at MozCon, like MonkeyLearn. There’s the new Moz Context API, which will be coming out soon, around that. There’s the Alchemy API, which a lot of folks really like and use.
  • Identifying and extracting opportunities based on site crawls. You run a Screaming Frog crawl on your site and you’re going, “Oh, here’s all these problems and issues.” If you don’t have these technical skills, you can’t diagnose that. You can’t figure out what’s wrong. You can’t figure out what needs fixing, what needs addressing.
  • Using rich snippet format to stand out in the SERPs. This is just getting a better click-through rate, which can seriously help your site and obviously your traffic.
  • Applying Google-supported protocols like rel=canonical, meta description, rel=prev/next, hreflang, robots.txt, meta robots, x robots, NOODP, XML sitemaps, rel=nofollow. The list goes on and on and on. If you’re not technical, you don’t know what those are, you think you just need to write good content and lower your bounce rate, it’s not going to work.
  • Using APIs from services like AdWords or MozScape, or hrefs from Majestic, or SEM refs from SearchScape or Alchemy API. Those APIs can have powerful things that they can do for your site. There are some powerful problems they could help you solve if you know how to use them. It’s actually not that hard to write something, even inside a Google Doc or Excel, to pull from an API and get some data in there. There’s a bunch of good tutorials out there. Richard Baxter has one, Annie Cushing has one, I think Distilled has some. So really cool stuff there.
  • Diagnosing page load speed issues, which goes right to what Jayson was talking about. You need that fast-loading page. Well, if you don’t have any technical skills, you can’t figure out why your page might not be loading quickly.
  • Diagnosing mobile friendliness issues
  • Advising app developers on the new protocols around App deep linking, so that you can get the content from your mobile apps into the web search results on mobile devices. Awesome. Super powerful. Potentially crazy powerful, as mobile search is becoming bigger than desktop.

Okay, I’m going to take a deep breath and relax. I don’t know Jayson’s intention, and in fact, if he were in this room, he’d be like, “No, I totally agree with all those things. I wrote the article in a rush. I had no idea it was going to be big. I was just trying to make the broader points around you don’t have to be a coder in order to do SEO.” That’s completely fine.

So I’m not going to try and rain criticism down on him. But I think if you’re reading that article, or you’re seeing it in your feed, or your clients are, or your boss is, or other folks are in your world, maybe you can point them to this Whiteboard Friday and let them know, no, that’s not quite right. There’s a ton of technical SEO that is required in 2015 and will be for years to come, I think, that SEOs have to have in order to be effective at their jobs.

All right, everyone. Look forward to some great comments, and we’ll see you again next time for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Advert