Tag Archive | "JavaScript"

AMP to supports custom JavaScript with amp-script

To be announced in more detail next week at the AMP Conference, Google is adding JavaScript support to AMP.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

How to Diagnose and Solve JavaScript SEO Issues in 6 Steps

Posted by tomek_rudzki

It’s rather common for companies to build their websites using modern JavaScript frameworks and libraries like React, Angular, or Vue. It’s obvious by now that the web has moved away from plain HTML and has entered the era of JS.

While there is nothing unusual with a business willing to take advantage of the latest technologies, we need to address the stark reality of this trend: Most of the migrations to JavaScript frameworks aren’t being planned with users or organic traffic in mind.

Let’s call it the JavaScript Paradox:

  1. The big brands jump on the JavaScript hype train after hearing all the buzz about JavaScript frameworks creating amazing UXs.
  2. Reality reveals that JavaScript frameworks are really complex.
  3. The big brands completely butcher the migrations to JavaScript. They lose organic traffic and often have to cut corners rather than creating this amazing UX journey for their users (I will mention some examples in this article).

Since there’s no turning back, SEOs need to learn how to deal with JavaScript websites.

But that’s easier said than done because making JavaScript websites successful in search engines is a real challenge both for developers and SEOs.

This article is meant to be a follow-up to my comprehensive Ultimate Guide to JavaScript SEO, and it’s intended to be as easy to follow as possible. So, grab yourself a cup of coffee and let’s have some fun — here are six steps to help you diagnose and solve JavaScript SEO issues.

Step 1: Use the URL inspection tool to see if Google can render your content

The URL inspection tool (formerly Google Fetch and Render) is a great free tool that allows you to check if Google can properly render your pages.

The URL inspection tool requires you to have your website connected to Google Search Console. If you don’t have an account yet, check Google’s Help pages.

Open Google Search Console, then click on the URL inspection button.


In the URL form field, type the full URL of a page you want to audit.

Then click on TEST LIVE URL.

Once the test is done, click on VIEW TESTED PAGE.

And finally, click on the Screenshot tab to view the rendered page.

Scroll down the screenshot to make sure your web page is rendered properly. Ask yourself the following questions:

  • Is the main content visible?
  • Can Google see the user-generated comments?
  • Can Google access areas like similar articles and products?
  • Can Google see other crucial elements of your page?

Why does the screenshot look different than what I see in my browser? Here are some possible reasons:

  • Google encountered timeouts while rendering.
  • Some errors occurred while rendering. You probably used some features that are not supported by Google Web Rendering Service (Google uses the four-year-old Chrome 41 for web rendering, which doesn’t support many modern features).
  • You blocked crucial JavaScript files for Googlebot.

Step 2: Make sure you didn’t block JavaScript files by mistake

If Google cannot render your page properly, you should make sure you didn’t block important JavaScript files for Googlebot in robots.txt

TL;DR: What is robots.txt?

It’s a plain text file that instructs Googlebot or any other search engine bot if they are allowed to request a page/resource.

Fortunately, the URL Inspection tool points out all the resources of a rendered page that are blocked by robots.txt.

But how can you tell if a blocked resource is important from the rendering point of view?

You have two options: Basic and Advanced.

Basic

In most cases, it may be a good idea to simply ask your developers about it. They created your website, so they should know it well.

Obviously, if the name of a script is called content.js or productListing.js, it’s probably relevant and shouldn’t be blocked.

Unfortunately, as for now, URL Inspection doesn’t inform you about the severity of a blocked JS file. The previous Google Fetch and Render had such an option:

Advanced

Now, we can use Chrome Developer Tools for that.

For educational purposes, we will be checking the following URL: http://botbenchmarking.com/youshallnotpass.html

Open the page in the most recent version of Chrome and go to Chrome Developers Tools. Then move to the Network tab and refresh the page.

Finally, select the desired resource (in our case it’s YouShallNotPass.js), right-click, and choose Block request URL.

Refresh the page and see if any important content disappeared. If so, then you should think about deleting the corresponding rule from your robots.txt file.

Step 3: Use the URL Inspection tool for fixing JavaScript errors

If you see Google Fetch and Render isn’t rendering your page properly, it may be due to the JavaScript errors that occurred while rendering.

To diagnose it, in the URL Inspection tool click on the More info tab.

Then, show these errors to your developers to let them fix it.

Just ONE error in the JavaScript code can stop rendering for Google, which in turn makes your website not indexable.

Your website may work fine in most recent browsers, but if it crashes in older browsers (Google Web Rendering Service is based on Chrome 41), your Google rankings may drop.

Need some examples?

  • A single error in the official Angular documentation caused Google to be unable to render our test Angular website.
  • Once upon a time, Google deindexed some pages of Angular.io, an official website of Angular 2+.

If you want to know why it happened, read my Ultimate Guide to JavaScript SEO.

Side note: If for some reason you don’t want to use the URL Inspection tool for debugging JavaScript errors, you can use Chrome 41 instead.

Personally, I prefer using Chrome 41 for debugging purposes, because it’s more universal and offers more flexibility. However, the URL Inspection tool is more accurate in simulating the Google Web Rendering Service, which is why I recommend that for people who are new to JavaScript SEO.

Step 4: Check if your content has been indexed in Google

It’s not enough to just see if Google can render your website properly. You have to make sure Google has properly indexed your content. The best option for this is to use the site: command.

It’s a very simple and very powerful tool. Its syntax is pretty straightforward: site:[URL of a website] “[fragment to be searched]”. Just take caution that you didn’t put the space between site: and the URL.

Let’s assume you want to check if Google indexed the following text “Develop across all platforms” which is featured on the homepage of Angular.io.

Type the following command in Google: site:angular.io “DEVELOP ACROSS ALL PLATFORMS”

As you can see, Google indexed that content, which is what you want, but that’s not always the case.

Takeaway:

  • Use the site: command whenever possible.
  • Check different page templates to make sure your entire website works fine. Don’t stop at one page!

If you’re fine, go to the next step. If that’s not the case, there may be a couple of reasons why this is happening:

  • Google still didn’t render your content. It should happen up to a few days/weeks after Google visited the URL. If the characteristics of your website require your content to be indexed as fast as possible, implement SSR.
  • Google encountered timeouts while rendering a page. Are your scripts fast? Do they remain responsive when the server load is high?
  • Google is still requesting old JS files. Well, Google tries to cache a lot to save their computing power. So, CSS and JS files may be cached aggressively. If you can see that you fixed all the JavaScript errors and Google still cannot render your website properly, it may be because Google uses old, cached JS and CSS files. To work around it, you can embed a version number in the filename, for example, name it bundle3424323.js. You can read more in Google Guides on HTTP Caching.
  • While indexing, Google may not fetch some resources if it decides that they don’t contribute to the essential page content.

Step 5: Make sure Google can discover your internal links

There are a few simple rules you should follow:

  1. Google needs proper <a href> links to discover the URLs on your website.
  2. If your links are added to the DOM only when somebody clicks on a button, Google won’t see it.

As simple as that is, plenty of big companies make these mistakes.

Proper link structure

Googlebot, in order to crawl a website, needs to have traditional “href” links. If it’s not provided, many of your webpages will simply be unreachable for Googlebot!

I think it was explained well by Tom Greenway (a Google representative) during the Google I/O conference:

Please note: if you have proper <a href> links, with some additional parameters, like onClick, data-url, ng-href, that’s still fine for Google.

A common mistake made by developers: Googlebot can’t access the second and subsequent pages of pagination

Not letting Googlebot discover pages from the second page of pagination and beyond is a common mistake that developers make.

When you open the mobile versions for Gearbest, Aliexpress and IKEA, you will quickly notice that, in fact, they don’t let Googlebot see the pagination links, which is really weird. When Google enables mobile-first indexing for these websites, these websites will suffer.

How do you check it on your own?

If you haven’t already downloaded Chrome 41, get it from Ele.ph/chrome41.

Then navigate to any page. For the sake of the tutorial, I’m using the mobile version of AliExpress.com. For educational purposes, it’s good if you follow the same example.

Open the mobile version of the Mobile Phones category of Aliexpress.

Then, right-click on View More and select the inspect button to see how it’s implemented.

As you can see, there are no <a href>, nor <link rel> links pointing to the second page of pagination.

There are over 2,000 products in the mobile phone category on Aliexpress.com. Since mobile Googlebot is able to access only 20 of them, that’s just 1 percent!

That means 99 percent of the products from that category are invisible for mobile Googlebot! That’s crazy!

These errors are caused by the wrong implementation of lazy loading. There are many other websites that make similar mistakes. You can read more in my article “Popular Websites that May Fail in Mobile First Indexing”.

TL;DR: using link rel=”next” alone is too weak a signal for Google

Note: it’s common to use “link rel=”next’ to indicate pagination series. However, the discoveries from Kyle Blanchette seem to show that “link rel=”next” alone is too weak a signal for Google and should be strengthened by the traditional <a href> links.

John Mueller discussed this more:

“We can understand which pages belong together with rel next, rel=”previous”, but if there are no links on the page at all, then it’s really hard for us to crawl from page to page. (…) So using the rel=”next” rel=”previous” in the head of a page is a great idea to tell us how these pages are connected, but you really need to have on-page, normal HTML links.

Don’t get me wrong — there is nothing wrong with using <link rel=”next”>. On the contrary, they are beneficial, but it’s good to combine these tags with traditional <a href> links.

Checking if Google can see menu links

Another important step in auditing a JavaScript website is to make sure Google can see your menu links. To check this, use Chrome 41.

For the purpose of the tutorial, we will use the case of Target.com:

To start, open any browser and pick some links from the menu:

Next, open Chrome 41. In the Chrome Developer Tools (click Ctrl + Shift + J),  navigate to the elements tab.

The results? Fortunately enough, Google can pick up the menu links of Target.com.

Now, check if Google can pick up the menu links on your website and see if you’re on target too.

Step 6: Checking if Google can discover content hidden under tabs

I have often observed that in the case of many e-commerce stores, Google cannot discover and index their content that is hidden under tabs (product descriptions, opinions, related products, etc). I know it’s weird, but it’s so common.

It’s a crucial part of every SEO audit to make sure Google can see content hidden under tabs.

Open Chrome 41 and navigate to any product on Boohoo.com; for instance, Muscle Fit Vest.

Click on Details & Care to see the product description:

“DETAILS & CARE

94% Cotton 6% Elastane. Muscle Fit Vest. Model is 6’1″ and Wears UK Size M.“

Now, it’s time to check if it’s in the DOM. To do so, go to Chrome Developers Tools (Ctrl + Shift + J) and click on the Network tab.

Make sure the disable cache option is enabled.

Click F5 to refresh the page. Once refreshed, navigate to the Elements tab and search for a product description:

As you can see, in the case of boohoo.com, Google is able to see the product description.

Perfect! Now take the time and check if your website is fine.

Wrapping up

Obviously, JavaScript SEO is a pretty complex subject, but I hope this tutorial was helpful.

If you are still struggling with Google ranking, you might want to think about implementing dynamic rendering or hybrid rendering. And, of course, feel free to reach out to me on Twitter about this or other SEO needs.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

The Minimum Viable Knowledge You Need to Work with JavaScript & SEO Today

Posted by sergeystefoglo

If your work involves SEO at some level, you’ve most likely been hearing more and more about JavaScript and the implications it has on crawling and indexing. Frankly, Googlebot struggles with it, and many websites utilize modern-day JavaScript to load in crucial content today. Because of this, we need to be equipped to discuss this topic when it comes up in order to be effective.

The goal of this post is to equip you with the minimum viable knowledge required to do so. This post won’t go into the nitty gritty details, describe the history, or give you extreme detail on specifics. There are a lot of incredible write-ups that already do this — I suggest giving them a read if you are interested in diving deeper (I’ll link out to my favorites at the bottom).

In order to be effective consultants when it comes to the topic of JavaScript and SEO, we need to be able to answer three questions:

  1. Does the domain/page in question rely on client-side JavaScript to load/change on-page content or links?
  2. If yes, is Googlebot seeing the content that’s loaded in via JavaScript properly?
  3. If not, what is the ideal solution?

With some quick searching, I was able to find three examples of landing pages that utilize JavaScript to load in crucial content.

I’m going to be using Sitecore’s Symposium landing page through each of these talking points to illustrate how to answer the questions above.

We’ll cover the “how do I do this” aspect first, and at the end I’ll expand on a few core concepts and link to further resources.

Question 1: Does the domain in question rely on client-side JavaScript to load/change on-page content or links?

The first step to diagnosing any issues involving JavaScript is to check if the domain uses it to load in crucial content that could impact SEO (on-page content or links). Ideally this will happen anytime you get a new client (during the initial technical audit), or whenever your client redesigns/launches new features of the site.

How do we go about doing this?

Ask the client

Ask, and you shall receive! Seriously though, one of the quickest/easiest things you can do as a consultant is contact your POC (or developers on the account) and ask them. After all, these are the people who work on the website day-in and day-out!

“Hi [client], we’re currently doing a technical sweep on the site. One thing we check is if any crucial content (links, on-page content) gets loaded in via JavaScript. We will do some manual testing, but an easy way to confirm this is to ask! Could you (or the team) answer the following, please?

1. Are we using client-side JavaScript to load in important content?

2. If yes, can we get a bulleted list of where/what content is loaded in via JavaScript?”

Check manually

Even on a large e-commerce website with millions of pages, there are usually only a handful of important page templates. In my experience, it should only take an hour max to check manually. I use the Chrome Web Developers plugin, disable JavaScript from there, and manually check the important templates of the site (homepage, category page, product page, blog post, etc.)

In the example above, once we turn off JavaScript and reload the page, we can see that we are looking at a blank page.

As you make progress, jot down notes about content that isn’t being loaded in, is being loaded in wrong, or any internal linking that isn’t working properly.

At the end of this step we should know if the domain in question relies on JavaScript to load/change on-page content or links. If the answer is yes, we should also know where this happens (homepage, category pages, specific modules, etc.)

Crawl

You could also crawl the site (with a tool like Screaming Frog or Sitebulb) with JavaScript rendering turned off, and then run the same crawl with JavaScript turned on, and compare the differences with internal links and on-page elements.

For example, it could be that when you crawl the site with JavaScript rendering turned off, the title tags don’t appear. In my mind this would trigger an action to crawl the site with JavaScript rendering turned on to see if the title tags do appear (as well as checking manually).

Example

For our example, I went ahead and did a manual check. As we can see from the screenshot below, when we disable JavaScript, the content does not load.

In other words, the answer to our first question for this pages is “yes, JavaScript is being used to load in crucial parts of the site.”

Question 2: If yes, is Googlebot seeing the content that’s loaded in via JavaScript properly?

If your client is relying on JavaScript on certain parts of their website (in our example they are), it is our job to try and replicate how Google is actually seeing the page(s). We want to answer the question, “Is Google seeing the page/site the way we want it to?”

In order to get a more accurate depiction of what Googlebot is seeing, we need to attempt to mimic how it crawls the page.

How do we do that?

Use Google’s new mobile-friendly testing tool

At the moment, the quickest and most accurate way to try and replicate what Googlebot is seeing on a site is by using Google’s new mobile friendliness tool. My colleague Dom recently wrote an in-depth post comparing Search Console Fetch and Render, Googlebot, and the mobile friendliness tool. His findings were that most of the time, Googlebot and the mobile friendliness tool resulted in the same output.

In Google’s mobile friendliness tool, simply input your URL, hit “run test,” and then once the test is complete, click on “source code” on the right side of the window. You can take that code and search for any on-page content (title tags, canonicals, etc.) or links. If they appear here, Google is most likely seeing the content.

Search for visible content in Google

It’s always good to sense-check. Another quick way to check if GoogleBot has indexed content on your page is by simply selecting visible text on your page, and doing a site:search for it in Google with quotations around said text.

In our example there is visible text on the page that reads…

“Whether you are in marketing, business development, or IT, you feel a sense of urgency. Or maybe opportunity?”

When we do a site:search for this exact phrase, for this exact page, we get nothing. This means Google hasn’t indexed the content.

Crawling with a tool

Most crawling tools have the functionality to crawl JavaScript now. For example, in Screaming Frog you can head to configuration > spider > rendering > then select “JavaScript” from the dropdown and hit save. DeepCrawl and SiteBulb both have this feature as well.

From here you can input your domain/URL and see the rendered page/code once your tool of choice has completed the crawl.

Example:

When attempting to answer this question, my preference is to start by inputting the domain into Google’s mobile friendliness tool, copy the source code, and searching for important on-page elements (think title tag, <h1>, body copy, etc.) It’s also helpful to use a tool like diff checker to compare the rendered HTML with the original HTML (Screaming Frog also has a function where you can do this side by side).

For our example, here is what the output of the mobile friendliness tool shows us.

After a few searches, it becomes clear that important on-page elements are missing here.

We also did the second test and confirmed that Google hasn’t indexed the body content found on this page.

The implication at this point is that Googlebot is not seeing our content the way we want it to, which is a problem.

Let’s jump ahead and see what we can recommend the client.

Question 3: If we’re confident Googlebot isn’t seeing our content properly, what should we recommend?

Now we know that the domain is using JavaScript to load in crucial content and we know that Googlebot is most likely not seeing that content, the final step is to recommend an ideal solution to the client. Key word: recommend, not implement. It’s 100% our job to flag the issue to our client, explain why it’s important (as well as the possible implications), and highlight an ideal solution. It is 100% not our job to try to do the developer’s job of figuring out an ideal solution with their unique stack/resources/etc.

How do we do that?

You want server-side rendering

The main reason why Google is having trouble seeing Sitecore’s landing page right now, is because Sitecore’s landing page is asking the user (us, Googlebot) to do the heavy work of loading the JavaScript on their page. In other words, they’re using client-side JavaScript.

Googlebot is literally landing on the page, trying to execute JavaScript as best as possible, and then needing to leave before it has a chance to see any content.

The fix here is to instead have Sitecore’s landing page load on their server. In other words, we want to take the heavy lifting off of Googlebot, and put it on Sitecore’s servers. This will ensure that when Googlebot comes to the page, it doesn’t have to do any heavy lifting and instead can crawl the rendered HTML.

In this scenario, Googlebot lands on the page and already sees the HTML (and all the content).

There are more specific options (like isomorphic setups)

This is where it gets to be a bit in the weeds, but there are hybrid solutions. The best one at the moment is called isomorphic.

In this model, we’re asking the client to load the first request on their server, and then any future requests are made client-side.

So Googlebot comes to the page, the client’s server has already executed the initial JavaScript needed for the page, sends the rendered HTML down to the browser, and anything after that is done on the client-side.

If you’re looking to recommend this as a solution, please read this post from the AirBNB team which covers isomorphic setups in detail.

AJAX crawling = no go

I won’t go into details on this, but just know that Google’s previous AJAX crawling solution for JavaScript has since been discontinued and will eventually not work. We shouldn’t be recommending this method.

(However, I am interested to hear any case studies from anyone who has implemented this solution recently. How has Google responded? Also, here’s a great write-up on this from my colleague Rob.)

Summary

At the risk of severely oversimplifying, here’s what you need to do in order to start working with JavaScript and SEO in 2018:

  1. Know when/where your client’s domain uses client-side JavaScript to load in on-page content or links.
    1. Ask the developers.
    2. Turn off JavaScript and do some manual testing by page template.
    3. Crawl using a JavaScript crawler.
  2. Check to see if GoogleBot is seeing content the way we intend it to.
    1. Google’s mobile friendliness checker.
    2. Doing a site:search for visible content on the page.
    3. Crawl using a JavaScript crawler.
  3. Give an ideal recommendation to client.
    1. Server-side rendering.
    2. Hybrid solutions (isomorphic).
    3. Not AJAX crawling.

Further resources

I’m really interested to hear about any of your experiences with JavaScript and SEO. What are some examples of things that have worked well for you? What about things that haven’t worked so well? If you’ve implemented an isomorphic setup, I’m curious to hear how that’s impacted how Googlebot sees your site.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

SearchCap: JavaScript powered websites, Google axes political ads & Amazon consumer survey

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Google: There Is No Fixed Timeout For JavaScript Page Rendering

So with all this JavaScript code and high end JS based web development going on, pages need to render and the heavier the code and the slower the server and computer accessing the page, the longer the wait time for the page to render…


Search Engine Roundtable

Posted in IM NewsComments Off

Going Beyond Google: Are Search Engines Ready for JavaScript Crawling & Indexing?

Posted by goralewicz

I recently published the results of my JavaScript SEO experiment where I checked which JavaScript frameworks are properly crawled and indexed by Google. The results were shocking; it turns out Google has a number of problems when crawling and indexing JavaScript-rich websites.

Google managed to index only a few out of multiple JavaScript frameworks tested. And as I proved, indexing content doesn’t always mean crawling JavaScript-generated links.

This got me thinking. If Google is having problems with JavaScript crawling and indexing, how are Google’s smaller competitors dealing with this problem? Is JavaScript going to lead you to full de-indexing in most search engines?

If you decide to deploy a client-rendered website (meaning a browser or Googlebot needs to process the JavaScript before seeing the HTML), you’re not only risking problems with your Google rankings — you may completely kill your chances at ranking in all the other search engines out there.

Google + JavaScript SEO experiment

To see how search engines other than Google deal with JavaScript crawling and indexing, we used our experiment website, http:/jsseo.expert, to check how Googlebot crawls and indexes JavaScript (and JavaScript frameworks’) generated content.

The experiment was quite simple: http://jsseo.expert has subpages with content parsed by different JavaScript frameworks. If you disable JavaScript, the content isn’t visible — i.e. if you go to http://jsseo.expert/angular2/, all the content within the red box is generated by Angular 2. If the content isn’t indexed in Yahoo, for example, we know that Yahoo’s indexer didn’t process the JavaScript.

Here are the results:

As you can see, Google and Ask are the only search engines to properly index JavaScript-generated content. Bing, Yahoo, AOL, DuckDuckGo, and Yandex are completely JavaScript-blind and won’t see your content if it isn’t HTML.

The next step: Can other search engines index JavaScript?

Most SEOs only cover JavaScript crawling and indexing issues when talking about Google. As you can see, the problem is much more complex. When you launch a client-rendered JavaScript-rich website (JavaScript is processed by the browser/crawler to “build” HTML), you can be 100% sure that it’s only going to be indexed and ranked in Google and Ask. Unfortunately, Google and Ask cover only ~64% of the whole search engine market, according to statista.com.

This means that your new, shiny, JavaScript-rich website can cost you ~36% of your website’s visibility on all search engines.

Let’s start with Yahoo, Bing, and AOL, which are responsible for 35% of search queries in the US.

Yahoo, Bing, and AOL

Even though Yahoo and AOL were here long before Google, they’ve obviously fallen behind its powerful algorithm and don’t invest in crawling and indexing as much as Google. One reason is likely the relatively high cost of crawling and indexing the web compared to the popularity of the website.

Google can freely invest millions of dollars in growing their computing power without worrying as much about return on investment, whereas Bing, AOL, and Ask only have a small percentage of the search market.

However, Microsoft-owned Bing isn’t out of the running. Their growth has been quite aggressive over last 8 years:

Unfortunately, we can’t say the same about one of the market pioneers: AOL. Do you remember the days before Google? This video will surely bring back some memories from a simpler time.

If you want to learn more about search engine history, I highly recommend watching Marcus Tandler’s spectacular TEDx talk.

Ask.com

What about Ask.com? How is it possible that Ask, with less than 1% of the market, can invest in crawling and indexing JavaScript? It makes me question if the Ask network is powered by Google’s algorithm and crawlers. It’s even more interesting looking at Ask’s aversion towards Google. There were already some speculations about Ask’s relationship with Google after Google Penguin in 2012, but we can now confirm that Ask’s crawling is using Google’s technology.

DuckDuckGo and Yandex

Both DuckDuckGo and Yandex had no problem indexing all the URLs within http://jsseo.expert, but unfortunately, the only content that was indexed properly was the 100% HTML page (http://jsseo.expert/html/).

Baidu

Despite my best efforts, I didn’t manage to index http://jsseo.expert in Baidu.com. It turns out you need a mainland China phone number to do that. I don’t have any previous experience with Baidu, so any and all help with indexing our experimental website would be appreciated. As soon as I succeed, I will update this article with Baidu.com results.

Going beyond the search engines

What if you don’t really care about search engines other than Google? Even if your target market is heavily dominated by Google, JavaScript crawling and indexing is still in an early stage, as my JavaScript SEO experiment documented.

Additionally, even if crawled and indexed properly, there is proof that JavaScript reliance can affect your rankings. Will Critchlow saw a significant traffic improvement after shifting from JavaScript-driven pages to non-JavaScript reliant.

Is there a JavaScript SEO silver bullet?

There is no search engine that can understand and process JavaScript at the level our modern browsers can. Even so, JavaScript isn’t inherently bad for SEO. JavaScript is awesome, but just like SEO, it requires experience and close attention to best practices.

If you want to enjoy all the perks of JavaScript without worrying about problems like Hulu.com’s JavaScript SEO issues, look into isomorphic JavaScript. It allows you to enjoy dynamic and beautiful websites without worrying about SEO.

If you’ve already developed a client-rendered website and can’t go back to the drawing board, you can always use pre-rendering services or enable server-side rendering. They often aren’t ideal solutions, but can definitely help you solve the JavaScript crawling and indexing problem until you come up with a better solution.

Regardless of the search engine, yet again we come back to testing and experimenting as a core component of technical SEO.

The future of JavaScript SEO

I highly recommend you follow along with how http://jsseo.expert/ is indexed in Google and other search engines. Even if some of the other search engines are a little behind Google, they’ll need to improve how they deal with JavaScript-rich websites to meet the exponentially growing demand for what JavaScript frameworks offer, both to developers and end users.

For now, stick to HTML & CSS on your front-end. :)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by alexis-sanders

Understanding JavaScript and its potential impact on search performance is a core skillset of the modern SEO professional. If search engines can’t crawl a site or can’t parse and understand the content, nothing is going to get indexed and the site is not going to rank.

The most important questions for an SEO relating to JavaScript: Can search engines see the content and grasp the website experience? If not, what solutions can be leveraged to fix this?


Fundamentals

What is JavaScript?

When creating a modern web page, there are three major components:

  1. HTML – Hypertext Markup Language serves as the backbone, or organizer of content, on a site. It is the structure of the website (e.g. headings, paragraphs, list elements, etc.) and defining static content.
  2. CSS – Cascading Style Sheets are the design, glitz, glam, and style added to a website. It makes up the presentation layer of the page.
  3. JavaScript – JavaScript is the interactivity and a core component of the dynamic web.

Learn more about webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

How Does Google Handle CSS + Javascript "Hidden" Text? – Whiteboard Friday

Posted by randfish

Does Google treat text kept behind “read more” links with the same importance as non-hidden text? The short answer is “no,” but there’s more nuance to it than that. In today’s Whiteboard Friday, Rand explains just how the search engine giant weighs text hidden from view using CSS and JavaScript.

How Google handles CSS and Javascript

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat a little bit about hidden text, hidden text of several kinds. I really don’t mean the spammy, black on a black background, white on a white background-like, hidden text type of keyword stuffing from the ’90s and early 2000s. I’m talking about what we do with CSS and JavaScript with overlays and with folders inside a page, that kind of hidden text.

It’s become very popular in modern web design to basically use CSS or to use JavaScript to load text after a user has taken some action on a page. So perhaps they’ve clicked on a separate section of your e-commerce page about your product to see other information, or maybe they’ve clicked a “read more” link in an article to read the rest of the article. This actually creates problems with Google and with SEO, and they’re not obvious problems, because when you use something like Google’s fetch and render tool or when you look at Google’s cache, Google appears to be able to crawl and parse all of that text. But they’re not treating all of it equally.

So here’s an example. I’ve got this text about coconut marble furnishings, which is just a ridiculous test phrase that I’m going to use for this purpose. But let’s say I’ve got page A, which essentially shows the first paragraph of this text, and then I have page B, which only shows part of the first sentence and then a “read more” link, which is very common in lots of articles.

Many folks do this, by the way, because they want to get engagement data about how many people actually read the rest of the piece. Others are using it for serving advertising, or they’re using it to track something, and some people are using it just because of the user experience it provides. Maybe the page is crowded with other types of content. They want to make sure that if someone wants to display this particular piece or that particular piece, that it’s available to them in as convenient a format as possible for design purposes or what have you.

What’s true in these instances is that Google is not going to treat what happens after this “read more” link is clicked, which is that the rest of this text would become visible here, they’re not going to treat that with the same weight that they otherwise would.

All other things being equal

So they’re on similar domains, they have similar link profiles, all that other kind of stuff.

  • A is going to outrank B for “coconut marble furnishings” even though this is in the title here. Because this text is relevant to that keyword and is serving to create greater relevance, Google is going to weight this one higher.
  • It’s also true that the content that’s hidden behind this “read more” here, it doesn’t matter. If it’s CSS-based, JavaScript-based, post load or loaded when the HTML is, it doesn’t matter, it’s going to be weighted less by Google. It will be treated as though that text were not as important.
  • Interestingly, fascinatingly, perhaps, Bing and Yahoo do not appear to discern between these. So they’ll treat these more equally. Google is the only one who seems to be, at least right now, from some test data — I’ll talk about that in a sec — who is treating these differently, who is basically weighting this hidden content less.

Best practices for SEO and “hidden” text

So what can we discern from this? What should SEOs do when we’re working with our web design teams and with our content teams around these types of issues?

I. We have to expect that any time we hide text with CSS, with JavaScript, what have you, that it will have less ranking influence. It’s not that it won’t be counted at all. If I were to search for “hardwood-like material creates beautiful shine,” like that exact phrase in Google with quotes, both of these pages would come up, this one almost certainly first, but both of these pages would come up.

So Google knows the text is there. It just isn’t counting it as highly. It’s like content that isn’t carrying the same weight as it would if it were visible by default. So, given that we know that, we have to decide in the tradeoff situation whether it’s worth it to lose the ranking value and the potential visitors in exchange for whatever we’re gaining by having this element.

II. We’ve got to consider some creative alternatives. It is possible to make text visible by default and to instead have something like an overlay element. We could have a brief overlay here that’s easily close-able with a message. Maybe that could give us the same types of engagement statistics, because 95% of people are going to close that before they scroll down, or they’re going to receive a popover message or those kinds of things. Granted, as we’ve discussed previously on Whiteboard Friday, overlays have their own issues that we need to be aware of, but these are possible. We can also measure scroll depth by doing some JavaScript tracking. There’s lots of software that does that by default and plenty of GitHub repositories, that are open source, that we could use to track that. So there might be other ways to get the same goals.

III. If it is the case that you have to use the “read more” or any other text hiding elements, I would urge you to go ahead and place the crucial information, including the keyword phrases and the most related terms and phrases that you know are going to be very important to rankings, up in that most visible top portion of the page so that you maximize the ranking weight of the most important pieces rather than losing those below or behind whatever sorts of post-loading situation you’ve got. Make those the default visible portions of text.

I do want to give special thanks. One of the reasons that we know this, certainly Google has mentioned it on occasion, but over the course of the last few years there’s been a lot of skepticism, especially from folks in the web design community who have sort of said, “Look, it seems like Google can see this. It doesn’t seem to be a problem. When I search in quotes for this text, Google is bringing it back.” That has been correct.

But thanks to Shai — I’m sorry if I mispronounce your name — Aharony from Reboot Online, RebootOnline.com, and I’ll link to the specific test that they performed, but they performed some wonderful, large-scale, long-term tests of CSS, of text area, of visible text, and of JavaScript hiding across many domains over a long period and basically proved to us that what Google says is in fact true, that they are treating this text behind here with less weight. So we really appreciate the efforts of folks like that, who go through intense effort to give us the truth about how Google works.

That said, we will hopefully see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Evidence of the Surprising State of JavaScript Indexing

Posted by willcritchlow

Back when I started in this industry, it was standard advice to tell our clients that the search engines couldn’t execute JavaScript (JS), and anything that relied on JS would be effectively invisible and never appear in the index. Over the years, that has changed gradually, from early work-arounds (such as the horrible escaped fragment approach my colleague Rob wrote about back in 2010) to the actual execution of JS in the indexing pipeline that we see today, at least at Google.

In this article, I want to explore some things we’ve seen about JS indexing behavior in the wild and in controlled tests and share some tentative conclusions I’ve drawn about how it must be working.

A brief introduction to JS indexing

At its most basic, the idea behind JavaScript-enabled indexing is to get closer to the search engine seeing the page as the user sees it. Most users browse with JavaScript enabled, and many sites either fail without it or are severely limited. While traditional indexing considers just the raw HTML source received from the server, users typically see a page rendered based on the DOM (Document Object Model) which can be modified by JavaScript running in their web browser. JS-enabled indexing considers all content in the rendered DOM, not just that which appears in the raw HTML.

There are some complexities even in this basic definition (answers in brackets as I understand them):

  • What about JavaScript that requests additional content from the server? (This will generally be included, subject to timeout limits)
  • What about JavaScript that executes some time after the page loads? (This will generally only be indexed up to some time limit, possibly in the region of 5 seconds)
  • What about JavaScript that executes on some user interaction such as scrolling or clicking? (This will generally not be included)
  • What about JavaScript in external files rather than in-line? (This will generally be included, as long as those external files are not blocked from the robot — though see the caveat in experiments below)

For more on the technical details, I recommend my ex-colleague Justin’s writing on the subject.

A high-level overview of my view of JavaScript best practices

Despite the incredible work-arounds of the past (which always seemed like more effort than graceful degradation to me) the “right” answer has existed since at least 2012, with the introduction of PushState. Rob wrote about this one, too. Back then, however, it was pretty clunky and manual and it required a concerted effort to ensure both that the URL was updated in the user’s browser for each view that should be considered a “page,” that the server could return full HTML for those pages in response to new requests for each URL, and that the back button was handled correctly by your JavaScript.

Along the way, in my opinion, too many sites got distracted by a separate prerendering step. This is an approach that does the equivalent of running a headless browser to generate static HTML pages that include any changes made by JavaScript on page load, then serving those snapshots instead of the JS-reliant page in response to requests from bots. It typically treats bots differently, in a way that Google tolerates, as long as the snapshots do represent the user experience. In my opinion, this approach is a poor compromise that’s too susceptible to silent failures and falling out of date. We’ve seen a bunch of sites suffer traffic drops due to serving Googlebot broken experiences that were not immediately detected because no regular users saw the prerendered pages.

These days, if you need or want JS-enhanced functionality, more of the top frameworks have the ability to work the way Rob described in 2012, which is now called isomorphic (roughly meaning “the same”).

Isomorphic JavaScript serves HTML that corresponds to the rendered DOM for each URL, and updates the URL for each “view” that should exist as a separate page as the content is updated via JS. With this implementation, there is actually no need to render the page to index basic content, as it’s served in response to any fresh request.

I was fascinated by this piece of research published recently — you should go and read the whole study. In particular, you should watch this video (recommended in the post) in which the speaker — who is an Angular developer and evangelist — emphasizes the need for an isomorphic approach:

Resources for auditing JavaScript

If you work in SEO, you will increasingly find yourself called upon to figure out whether a particular implementation is correct (hopefully on a staging/development server before it’s deployed live, but who are we kidding? You’ll be doing this live, too).

To do that, here are some resources I’ve found useful:

Some surprising/interesting results

There are likely to be timeouts on JavaScript execution

I already linked above to the ScreamingFrog post that mentions experiments they have done to measure the timeout Google uses to determine when to stop executing JavaScript (they found a limit of around 5 seconds).

It may be more complicated than that, however. This segment of a thread is interesting. It’s from a Hacker News user who goes by the username KMag and who claims to have worked at Google on the JS execution part of the indexing pipeline from 2006–2010. It’s in relation to another user speculating that Google would not care about content loaded “async” (i.e. asynchronously — in other words, loaded as part of new HTTP requests that are triggered in the background while assets continue to download):

“Actually, we did care about this content. I’m not at liberty to explain the details, but we did execute setTimeouts up to some time limit.

If they’re smart, they actually make the exact timeout a function of a HMAC of the loaded source, to make it very difficult to experiment around, find the exact limits, and fool the indexing system. Back in 2010, it was still a fixed time limit.”

What that means is that although it was initially a fixed timeout, he’s speculating (or possibly sharing without directly doing so) that timeouts are programmatically determined (presumably based on page importance and JavaScript reliance) and that they may be tied to the exact source code (the reference to “HMAC” is to do with a technical mechanism for spotting if the page has changed).

It matters how your JS is executed

I referenced this recent study earlier. In it, the author found:

Inline vs. External vs. Bundled JavaScript makes a huge difference for Googlebot

The charts at the end show the extent to which popular JavaScript frameworks perform differently depending on how they’re called, with a range of performance from passing every test to failing almost every test. For example here’s the chart for Angular:

Slide5.PNG

It’s definitely worth reading the whole thing and reviewing the performance of the different frameworks. There’s more evidence of Google saving computing resources in some areas, as well as surprising results between different frameworks.

CRO tests are getting indexed

When we first started seeing JavaScript-based split-testing platforms designed for testing changes aimed at improving conversion rate (CRO = conversion rate optimization), their inline changes to individual pages were invisible to the search engines. As Google in particular has moved up the JavaScript competency ladder through executing simple inline JS to more complex JS in external files, we are now seeing some CRO-platform-created changes being indexed. A simplified version of what’s happening is:

  • For users:
    • CRO platforms typically take a visitor to a page, check for the existence of a cookie, and if there isn’t one, randomly assign the visitor to group A or group B
    • Based on either the cookie value or the new assignment, the user is either served the page unchanged, or sees a version that is modified in their browser by JavaScript loaded from the CRO platform’s CDN (content delivery network)
    • A cookie is then set to make sure that the user sees the same version if they revisit that page later
  • For Googlebot:
    • The reliance on external JavaScript used to prevent both the bucketing and the inline changes from being indexed
    • With external JavaScript now being loaded, and with many of these inline changes being made using standard libraries (such as JQuery), Google is able to index the variant and hence we see CRO experiments sometimes being indexed

I might have expected the platforms to block their JS with robots.txt, but at least the main platforms I’ve looked at don’t do that. With Google being sympathetic towards testing, however, this shouldn’t be a major problem — just something to be aware of as you build out your user-facing CRO tests. All the more reason for your UX and SEO teams to work closely together and communicate well.

Split tests show SEO improvements from removing a reliance on JS

Although we would like to do a lot more to test the actual real-world impact of relying on JavaScript, we do have some early results. At the end of last week I published a post outlining the uplift we saw from removing a site’s reliance on JS to display content and links on category pages.

odn_additional_sessions.png

A simple test that removed the need for JavaScript on 50% of pages showed a >6% uplift in organic traffic — worth thousands of extra sessions a month. While we haven’t proven that JavaScript is always bad, nor understood the exact mechanism at work here, we have opened up a new avenue for exploration, and at least shown that it’s not a settled matter. To my mind, it highlights the importance of testing. It’s obviously our belief in the importance of SEO split-testing that led to us investing so much in the development of the ODN platform over the last 18 months or so.

Conclusion: How JavaScript indexing might work from a systems perspective

Based on all of the information we can piece together from the external behavior of the search results, public comments from Googlers, tests and experiments, and first principles, here’s how I think JavaScript indexing is working at Google at the moment: I think there is a separate queue for JS-enabled rendering, because the computational cost of trying to run JavaScript over the entire web is unnecessary given the lack of a need for it on many, many pages. In detail, I think:

  • Googlebot crawls and caches HTML and core resources regularly
  • Heuristics (and probably machine learning) are used to prioritize JavaScript rendering for each page:
    • Some pages are indexed with no JS execution. There are many pages that can probably be easily identified as not needing rendering, and others which are such a low priority that it isn’t worth the computing resources.
    • Some pages get immediate rendering – or possibly immediate basic/regular indexing, along with high-priority rendering. This would enable the immediate indexation of pages in news results or other QDF results, but also allow pages that rely heavily on JS to get updated indexation when the rendering completes.
    • Many pages are rendered async in a separate process/queue from both crawling and regular indexing, thereby adding the page to the index for new words and phrases found only in the JS-rendered version when rendering completes, in addition to the words and phrases found in the unrendered version indexed initially.
  • The JS rendering also, in addition to adding pages to the index:
    • May make modifications to the link graph
    • May add new URLs to the discovery/crawling queue for Googlebot

The idea of JavaScript rendering as a distinct and separate part of the indexing pipeline is backed up by this quote from KMag, who I mentioned previously for his contributions to this HN thread (direct link) [emphasis mine]:

“I was working on the lightweight high-performance JavaScript interpretation system that sandboxed pretty much just a JS engine and a DOM implementation that we could run on every web page on the index. Most of my work was trying to improve the fidelity of the system. My code analyzed every web page in the index.

Towards the end of my time there, there was someone in Mountain View working on a heavier, higher-fidelity system that sandboxed much more of a browser, and they were trying to improve performance so they could use it on a higher percentage of the index.”

This was the situation in 2010. It seems likely that they have moved a long way towards the headless browser in all cases, but I’m skeptical about whether it would be worth their while to render every page they crawl with JavaScript given the expense of doing so and the fact that a large percentage of pages do not change substantially when you do.

My best guess is that they’re using a combination of trying to figure out the need for JavaScript execution on a given page, coupled with trust/authority metrics to decide whether (and with what priority) to render a page with JS.

Run a test, get publicity

I have a hypothesis that I would love to see someone test: That it’s possible to get a page indexed and ranking for a nonsense word contained in the served HTML, but not initially ranking for a different nonsense word added via JavaScript; then, to see the JS get indexed some period of time later and rank for both nonsense words. If you want to run that test, let me know the results — I’d be happy to publicize them.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Case study: JavaScript blocking Google’s view of hreflang

Contributor Sam Gipson troubleshoots issues with a client’s hreflang implementation, testing to see if JavaScript elements might interfere with Google recognizing these tags.

The post Case study: JavaScript blocking Google’s view of hreflang appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off


Advert