Tag Archive | "Indexing"

The Google Request Indexing Quota Seems To Be Enforced

Last week we reported how Google removed the quota notification within the request indexing feature in the Google Search Console fetch as Google feature. We noted the documentation still had the quota but the interface removed the quota notification…


Search Engine Roundtable

Posted in IM NewsComments Off

Does Googlebot Support HTTP/2? Challenging Google’s Indexing Claims – An Experiment

Posted by goralewicz

I was recently challenged with a question from a client, Robert, who runs a small PR firm and needed to optimize a client’s website. His question inspired me to run a small experiment in HTTP protocols. So what was Robert’s question? He asked…

Can Googlebot crawl using HTTP/2 protocols?

You may be asking yourself, why should I care about Robert and his HTTP protocols?

As a refresher, HTTP protocols are the basic set of standards allowing the World Wide Web to exchange information. They are the reason a web browser can display data stored on another server. The first was initiated back in 1989, which means, just like everything else, HTTP protocols are getting outdated. HTTP/2 is one of the latest versions of HTTP protocol to be created to replace these aging versions.

So, back to our question: why do you, as an SEO, care to know more about HTTP protocols? The short answer is that none of your SEO efforts matter or can even be done without a basic understanding of HTTP protocol. Robert knew that if his site wasn’t indexing correctly, his client would miss out on valuable web traffic from searches.

The hype around HTTP/2

HTTP/1.1 is a 17-year-old protocol (HTTP 1.0 is 21 years old). Both HTTP 1.0 and 1.1 have limitations, mostly related to performance. When HTTP/1.1 was getting too slow and out of date, Google introduced SPDY in 2009, which was the basis for HTTP/2. Side note: Starting from Chrome 53, Google decided to stop supporting SPDY in favor of HTTP/2.

HTTP/2 was a long-awaited protocol. Its main goal is to improve a website’s performance. It’s currently used by 17% of websites (as of September 2017). Adoption rate is growing rapidly, as only 10% of websites were using HTTP/2 in January 2017. You can see the adoption rate charts here. HTTP/2 is getting more and more popular, and is widely supported by modern browsers (like Chrome or Firefox) and web servers (including Apache, Nginx, and IIS).

Its key advantages are:

  • Multiplexing: The ability to send multiple requests through a single TCP connection.
  • Server push: When a client requires some resource (let’s say, an HTML document), a server can push CSS and JS files to a client cache. It reduces network latency and round-trips.
  • One connection per origin: With HTTP/2, only one connection is needed to load the website.
  • Stream prioritization: Requests (streams) are assigned a priority from 1 to 256 to deliver higher-priority resources faster.
  • Binary framing layer: HTTP/2 is easier to parse (for both the server and user).
  • Header compression: This feature reduces overhead from plain text in HTTP/1.1 and improves performance.

For more information, I highly recommend reading “Introduction to HTTP/2” by Surma and Ilya Grigorik.

All these benefits suggest pushing for HTTP/2 support as soon as possible. However, my experience with technical SEO has taught me to double-check and experiment with solutions that might affect our SEO efforts.

So the question is: Does Googlebot support HTTP/2?

Google’s promises

HTTP/2 represents a promised land, the technical SEO oasis everyone was searching for. By now, many websites have already added HTTP/2 support, and developers don’t want to optimize for HTTP/1.1 anymore. Before I could answer Robert’s question, I needed to know whether or not Googlebot supported HTTP/2-only crawling.

I was not alone in my query. This is a topic which comes up often on Twitter, Google Hangouts, and other such forums. And like Robert, I had clients pressing me for answers. The experiment needed to happen. Below I’ll lay out exactly how we arrived at our answer, but here’s the spoiler: it doesn’t. Google doesn’t crawl using the HTTP/2 protocol. If your website uses HTTP/2, you need to make sure you continue to optimize the HTTP/1.1 version for crawling purposes.

The question

It all started with a Google Hangouts in November 2015.

When asked about HTTP/2 support, John Mueller mentioned that HTTP/2-only crawling should be ready by early 2016, and he also mentioned that HTTP/2 would make it easier for Googlebot to crawl pages by bundling requests (images, JS, and CSS could be downloaded with a single bundled request).

“At the moment, Google doesn’t support HTTP/2-only crawling (…) We are working on that, I suspect it will be ready by the end of this year (2015) or early next year (2016) (…) One of the big advantages of HTTP/2 is that you can bundle requests, so if you are looking at a page and it has a bunch of embedded images, CSS, JavaScript files, theoretically you can make one request for all of those files and get everything together. So that would make it a little bit easier to crawl pages while we are rendering them for example.”

Soon after, Twitter user Kai Spriestersbach also asked about HTTP/2 support:

His clients started dropping HTTP/1.1 connections optimization, just like most developers deploying HTTP/2, which was at the time supported by all major browsers.

After a few quiet months, Google Webmasters reignited the conversation, tweeting that Google won’t hold you back if you’re setting up for HTTP/2. At this time, however, we still had no definitive word on HTTP/2-only crawling. Just because it won’t hold you back doesn’t mean it can handle it — which is why I decided to test the hypothesis.

The experiment

For months as I was following this online debate, I still received questions from our clients who no longer wanted want to spend money on HTTP/1.1 optimization. Thus, I decided to create a very simple (and bold) experiment.

I decided to disable HTTP/1.1 on my own website (https://goralewicz.com) and make it HTTP/2 only. I disabled HTTP/1.1 from March 7th until March 13th.

If you’re going to get bad news, at the very least it should come quickly. I didn’t have to wait long to see if my experiment “took.” Very shortly after disabling HTTP/1.1, I couldn’t fetch and render my website in Google Search Console; I was getting an error every time.

My website is fairly small, but I could clearly see that the crawling stats decreased after disabling HTTP/1.1. Google was no longer visiting my site.

While I could have kept going, I stopped the experiment after my website was partially de-indexed due to “Access Denied” errors.

The results

I didn’t need any more information; the proof was right there. Googlebot wasn’t supporting HTTP/2-only crawling. Should you choose to duplicate this at home with our own site, you’ll be happy to know that my site recovered very quickly.

I finally had Robert’s answer, but felt others may benefit from it as well. A few weeks after finishing my experiment, I decided to ask John about HTTP/2 crawling on Twitter and see what he had to say.

(I love that he responds.)

Knowing the results of my experiment, I have to agree with John: disabling HTTP/1 was a bad idea. However, I was seeing other developers discontinuing optimization for HTTP/1, which is why I wanted to test HTTP/2 on its own.

For those looking to run their own experiment, there are two ways of negotiating a HTTP/2 connection:

1. Over HTTP (unsecure) – Make an HTTP/1.1 request that includes an Upgrade header. This seems to be the method to which John Mueller was referring. However, it doesn’t apply to my website (because it’s served via HTTPS). What is more, this is an old-fashioned way of negotiating, not supported by modern browsers. Below is a screenshot from Caniuse.com:

2. Over HTTPS (secure) – Connection is negotiated via the ALPN protocol (HTTP/1.1 is not involved in this process). This method is preferred and widely supported by modern browsers and servers.

A recent announcement: The saga continues

Googlebot doesn’t make HTTP/2 requests

Fortunately, Ilya Grigorik, a web performance engineer at Google, let everyone peek behind the curtains at how Googlebot is crawling websites and the technology behind it:

If that wasn’t enough, Googlebot doesn’t support the WebSocket protocol. That means your server can’t send resources to Googlebot before they are requested. Supporting it wouldn’t reduce network latency and round-trips; it would simply slow everything down. Modern browsers offer many ways of loading content, including WebRTC, WebSockets, loading local content from drive, etc. However, Googlebot supports only HTTP/FTP, with or without Transport Layer Security (TLS).

Googlebot supports SPDY

During my research and after John Mueller’s feedback, I decided to consult an HTTP/2 expert. I contacted Peter Nikolow of Mobilio, and asked him to see if there were anything we could do to find the final answer regarding Googlebot’s HTTP/2 support. Not only did he provide us with help, Peter even created an experiment for us to use. Its results are pretty straightforward: Googlebot does support the SPDY protocol and Next Protocol Navigation (NPN). And thus, it can’t support HTTP/2.

Below is Peter’s response:


I performed an experiment that shows Googlebot uses SPDY protocol. Because it supports SPDY + NPN, it cannot support HTTP/2. There are many cons to continued support of SPDY:

    1. This protocol is vulnerable
    2. Google Chrome no longer supports SPDY in favor of HTTP/2
    3. Servers have been neglecting to support SPDY. Let’s examine the NGINX example: from version 1.95, they no longer support SPDY.
    4. Apache doesn’t support SPDY out of the box. You need to install mod_spdy, which is provided by Google.

To examine Googlebot and the protocols it uses, I took advantage of s_server, a tool that can debug TLS connections. I used Google Search Console Fetch and Render to send Googlebot to my website.

Here’s a screenshot from this tool showing that Googlebot is using Next Protocol Navigation (and therefore SPDY):

I’ll briefly explain how you can perform your own test. The first thing you should know is that you can’t use scripting languages (like PHP or Python) for debugging TLS handshakes. The reason for that is simple: these languages see HTTP-level data only. Instead, you should use special tools for debugging TLS handshakes, such as s_server.

Type in the console:

sudo openssl s_server -key key.pem -cert cert.pem -accept 443 -WWW -tlsextdebug -state -msg
sudo openssl s_server -key key.pem -cert cert.pem -accept 443 -www -tlsextdebug -state -msg

Please note the slight (but significant) difference between the “-WWW” and “-www” options in these commands. You can find more about their purpose in the s_server documentation.

Next, invite Googlebot to visit your site by entering the URL in Google Search Console Fetch and Render or in the Google mobile tester.

As I wrote above, there is no logical reason why Googlebot supports SPDY. This protocol is vulnerable; no modern browser supports it. Additionally, servers (including NGINX) neglect to support it. It’s just a matter of time until Googlebot will be able to crawl using HTTP/2. Just implement HTTP 1.1 + HTTP/2 support on your own server (your users will notice due to faster loading) and wait until Google is able to send requests using HTTP/2.


Summary

In November 2015, John Mueller said he expected Googlebot to crawl websites by sending HTTP/2 requests starting in early 2016. We don’t know why, as of October 2017, that hasn’t happened yet.

What we do know is that Googlebot doesn’t support HTTP/2. It still crawls by sending HTTP/ 1.1 requests. Both this experiment and the “Rendering on Google Search” page confirm it. (If you’d like to know more about the technology behind Googlebot, then you should check out what they recently shared.)

For now, it seems we have to accept the status quo. We recommended that Robert (and you readers as well) enable HTTP/2 on your websites for better performance, but continue optimizing for HTTP/ 1.1. Your visitors will notice and thank you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Going Beyond Google: Are Search Engines Ready for JavaScript Crawling & Indexing?

Posted by goralewicz

I recently published the results of my JavaScript SEO experiment where I checked which JavaScript frameworks are properly crawled and indexed by Google. The results were shocking; it turns out Google has a number of problems when crawling and indexing JavaScript-rich websites.

Google managed to index only a few out of multiple JavaScript frameworks tested. And as I proved, indexing content doesn’t always mean crawling JavaScript-generated links.

This got me thinking. If Google is having problems with JavaScript crawling and indexing, how are Google’s smaller competitors dealing with this problem? Is JavaScript going to lead you to full de-indexing in most search engines?

If you decide to deploy a client-rendered website (meaning a browser or Googlebot needs to process the JavaScript before seeing the HTML), you’re not only risking problems with your Google rankings — you may completely kill your chances at ranking in all the other search engines out there.

Google + JavaScript SEO experiment

To see how search engines other than Google deal with JavaScript crawling and indexing, we used our experiment website, http:/jsseo.expert, to check how Googlebot crawls and indexes JavaScript (and JavaScript frameworks’) generated content.

The experiment was quite simple: http://jsseo.expert has subpages with content parsed by different JavaScript frameworks. If you disable JavaScript, the content isn’t visible — i.e. if you go to http://jsseo.expert/angular2/, all the content within the red box is generated by Angular 2. If the content isn’t indexed in Yahoo, for example, we know that Yahoo’s indexer didn’t process the JavaScript.

Here are the results:

As you can see, Google and Ask are the only search engines to properly index JavaScript-generated content. Bing, Yahoo, AOL, DuckDuckGo, and Yandex are completely JavaScript-blind and won’t see your content if it isn’t HTML.

The next step: Can other search engines index JavaScript?

Most SEOs only cover JavaScript crawling and indexing issues when talking about Google. As you can see, the problem is much more complex. When you launch a client-rendered JavaScript-rich website (JavaScript is processed by the browser/crawler to “build” HTML), you can be 100% sure that it’s only going to be indexed and ranked in Google and Ask. Unfortunately, Google and Ask cover only ~64% of the whole search engine market, according to statista.com.

This means that your new, shiny, JavaScript-rich website can cost you ~36% of your website’s visibility on all search engines.

Let’s start with Yahoo, Bing, and AOL, which are responsible for 35% of search queries in the US.

Yahoo, Bing, and AOL

Even though Yahoo and AOL were here long before Google, they’ve obviously fallen behind its powerful algorithm and don’t invest in crawling and indexing as much as Google. One reason is likely the relatively high cost of crawling and indexing the web compared to the popularity of the website.

Google can freely invest millions of dollars in growing their computing power without worrying as much about return on investment, whereas Bing, AOL, and Ask only have a small percentage of the search market.

However, Microsoft-owned Bing isn’t out of the running. Their growth has been quite aggressive over last 8 years:

Unfortunately, we can’t say the same about one of the market pioneers: AOL. Do you remember the days before Google? This video will surely bring back some memories from a simpler time.

If you want to learn more about search engine history, I highly recommend watching Marcus Tandler’s spectacular TEDx talk.

Ask.com

What about Ask.com? How is it possible that Ask, with less than 1% of the market, can invest in crawling and indexing JavaScript? It makes me question if the Ask network is powered by Google’s algorithm and crawlers. It’s even more interesting looking at Ask’s aversion towards Google. There were already some speculations about Ask’s relationship with Google after Google Penguin in 2012, but we can now confirm that Ask’s crawling is using Google’s technology.

DuckDuckGo and Yandex

Both DuckDuckGo and Yandex had no problem indexing all the URLs within http://jsseo.expert, but unfortunately, the only content that was indexed properly was the 100% HTML page (http://jsseo.expert/html/).

Baidu

Despite my best efforts, I didn’t manage to index http://jsseo.expert in Baidu.com. It turns out you need a mainland China phone number to do that. I don’t have any previous experience with Baidu, so any and all help with indexing our experimental website would be appreciated. As soon as I succeed, I will update this article with Baidu.com results.

Going beyond the search engines

What if you don’t really care about search engines other than Google? Even if your target market is heavily dominated by Google, JavaScript crawling and indexing is still in an early stage, as my JavaScript SEO experiment documented.

Additionally, even if crawled and indexed properly, there is proof that JavaScript reliance can affect your rankings. Will Critchlow saw a significant traffic improvement after shifting from JavaScript-driven pages to non-JavaScript reliant.

Is there a JavaScript SEO silver bullet?

There is no search engine that can understand and process JavaScript at the level our modern browsers can. Even so, JavaScript isn’t inherently bad for SEO. JavaScript is awesome, but just like SEO, it requires experience and close attention to best practices.

If you want to enjoy all the perks of JavaScript without worrying about problems like Hulu.com’s JavaScript SEO issues, look into isomorphic JavaScript. It allows you to enjoy dynamic and beautiful websites without worrying about SEO.

If you’ve already developed a client-rendered website and can’t go back to the drawing board, you can always use pre-rendering services or enable server-side rendering. They often aren’t ideal solutions, but can definitely help you solve the JavaScript crawling and indexing problem until you come up with a better solution.

Regardless of the search engine, yet again we come back to testing and experimenting as a core component of technical SEO.

The future of JavaScript SEO

I highly recommend you follow along with how http://jsseo.expert/ is indexed in Google and other search engines. Even if some of the other search engines are a little behind Google, they’ll need to improve how they deal with JavaScript-rich websites to meet the exponentially growing demand for what JavaScript frameworks offer, both to developers and end users.

For now, stick to HTML & CSS on your front-end. :)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Evidence of the Surprising State of JavaScript Indexing

Posted by willcritchlow

Back when I started in this industry, it was standard advice to tell our clients that the search engines couldn’t execute JavaScript (JS), and anything that relied on JS would be effectively invisible and never appear in the index. Over the years, that has changed gradually, from early work-arounds (such as the horrible escaped fragment approach my colleague Rob wrote about back in 2010) to the actual execution of JS in the indexing pipeline that we see today, at least at Google.

In this article, I want to explore some things we’ve seen about JS indexing behavior in the wild and in controlled tests and share some tentative conclusions I’ve drawn about how it must be working.

A brief introduction to JS indexing

At its most basic, the idea behind JavaScript-enabled indexing is to get closer to the search engine seeing the page as the user sees it. Most users browse with JavaScript enabled, and many sites either fail without it or are severely limited. While traditional indexing considers just the raw HTML source received from the server, users typically see a page rendered based on the DOM (Document Object Model) which can be modified by JavaScript running in their web browser. JS-enabled indexing considers all content in the rendered DOM, not just that which appears in the raw HTML.

There are some complexities even in this basic definition (answers in brackets as I understand them):

  • What about JavaScript that requests additional content from the server? (This will generally be included, subject to timeout limits)
  • What about JavaScript that executes some time after the page loads? (This will generally only be indexed up to some time limit, possibly in the region of 5 seconds)
  • What about JavaScript that executes on some user interaction such as scrolling or clicking? (This will generally not be included)
  • What about JavaScript in external files rather than in-line? (This will generally be included, as long as those external files are not blocked from the robot — though see the caveat in experiments below)

For more on the technical details, I recommend my ex-colleague Justin’s writing on the subject.

A high-level overview of my view of JavaScript best practices

Despite the incredible work-arounds of the past (which always seemed like more effort than graceful degradation to me) the “right” answer has existed since at least 2012, with the introduction of PushState. Rob wrote about this one, too. Back then, however, it was pretty clunky and manual and it required a concerted effort to ensure both that the URL was updated in the user’s browser for each view that should be considered a “page,” that the server could return full HTML for those pages in response to new requests for each URL, and that the back button was handled correctly by your JavaScript.

Along the way, in my opinion, too many sites got distracted by a separate prerendering step. This is an approach that does the equivalent of running a headless browser to generate static HTML pages that include any changes made by JavaScript on page load, then serving those snapshots instead of the JS-reliant page in response to requests from bots. It typically treats bots differently, in a way that Google tolerates, as long as the snapshots do represent the user experience. In my opinion, this approach is a poor compromise that’s too susceptible to silent failures and falling out of date. We’ve seen a bunch of sites suffer traffic drops due to serving Googlebot broken experiences that were not immediately detected because no regular users saw the prerendered pages.

These days, if you need or want JS-enhanced functionality, more of the top frameworks have the ability to work the way Rob described in 2012, which is now called isomorphic (roughly meaning “the same”).

Isomorphic JavaScript serves HTML that corresponds to the rendered DOM for each URL, and updates the URL for each “view” that should exist as a separate page as the content is updated via JS. With this implementation, there is actually no need to render the page to index basic content, as it’s served in response to any fresh request.

I was fascinated by this piece of research published recently — you should go and read the whole study. In particular, you should watch this video (recommended in the post) in which the speaker — who is an Angular developer and evangelist — emphasizes the need for an isomorphic approach:

Resources for auditing JavaScript

If you work in SEO, you will increasingly find yourself called upon to figure out whether a particular implementation is correct (hopefully on a staging/development server before it’s deployed live, but who are we kidding? You’ll be doing this live, too).

To do that, here are some resources I’ve found useful:

Some surprising/interesting results

There are likely to be timeouts on JavaScript execution

I already linked above to the ScreamingFrog post that mentions experiments they have done to measure the timeout Google uses to determine when to stop executing JavaScript (they found a limit of around 5 seconds).

It may be more complicated than that, however. This segment of a thread is interesting. It’s from a Hacker News user who goes by the username KMag and who claims to have worked at Google on the JS execution part of the indexing pipeline from 2006–2010. It’s in relation to another user speculating that Google would not care about content loaded “async” (i.e. asynchronously — in other words, loaded as part of new HTTP requests that are triggered in the background while assets continue to download):

“Actually, we did care about this content. I’m not at liberty to explain the details, but we did execute setTimeouts up to some time limit.

If they’re smart, they actually make the exact timeout a function of a HMAC of the loaded source, to make it very difficult to experiment around, find the exact limits, and fool the indexing system. Back in 2010, it was still a fixed time limit.”

What that means is that although it was initially a fixed timeout, he’s speculating (or possibly sharing without directly doing so) that timeouts are programmatically determined (presumably based on page importance and JavaScript reliance) and that they may be tied to the exact source code (the reference to “HMAC” is to do with a technical mechanism for spotting if the page has changed).

It matters how your JS is executed

I referenced this recent study earlier. In it, the author found:

Inline vs. External vs. Bundled JavaScript makes a huge difference for Googlebot

The charts at the end show the extent to which popular JavaScript frameworks perform differently depending on how they’re called, with a range of performance from passing every test to failing almost every test. For example here’s the chart for Angular:

Slide5.PNG

It’s definitely worth reading the whole thing and reviewing the performance of the different frameworks. There’s more evidence of Google saving computing resources in some areas, as well as surprising results between different frameworks.

CRO tests are getting indexed

When we first started seeing JavaScript-based split-testing platforms designed for testing changes aimed at improving conversion rate (CRO = conversion rate optimization), their inline changes to individual pages were invisible to the search engines. As Google in particular has moved up the JavaScript competency ladder through executing simple inline JS to more complex JS in external files, we are now seeing some CRO-platform-created changes being indexed. A simplified version of what’s happening is:

  • For users:
    • CRO platforms typically take a visitor to a page, check for the existence of a cookie, and if there isn’t one, randomly assign the visitor to group A or group B
    • Based on either the cookie value or the new assignment, the user is either served the page unchanged, or sees a version that is modified in their browser by JavaScript loaded from the CRO platform’s CDN (content delivery network)
    • A cookie is then set to make sure that the user sees the same version if they revisit that page later
  • For Googlebot:
    • The reliance on external JavaScript used to prevent both the bucketing and the inline changes from being indexed
    • With external JavaScript now being loaded, and with many of these inline changes being made using standard libraries (such as JQuery), Google is able to index the variant and hence we see CRO experiments sometimes being indexed

I might have expected the platforms to block their JS with robots.txt, but at least the main platforms I’ve looked at don’t do that. With Google being sympathetic towards testing, however, this shouldn’t be a major problem — just something to be aware of as you build out your user-facing CRO tests. All the more reason for your UX and SEO teams to work closely together and communicate well.

Split tests show SEO improvements from removing a reliance on JS

Although we would like to do a lot more to test the actual real-world impact of relying on JavaScript, we do have some early results. At the end of last week I published a post outlining the uplift we saw from removing a site’s reliance on JS to display content and links on category pages.

odn_additional_sessions.png

A simple test that removed the need for JavaScript on 50% of pages showed a >6% uplift in organic traffic — worth thousands of extra sessions a month. While we haven’t proven that JavaScript is always bad, nor understood the exact mechanism at work here, we have opened up a new avenue for exploration, and at least shown that it’s not a settled matter. To my mind, it highlights the importance of testing. It’s obviously our belief in the importance of SEO split-testing that led to us investing so much in the development of the ODN platform over the last 18 months or so.

Conclusion: How JavaScript indexing might work from a systems perspective

Based on all of the information we can piece together from the external behavior of the search results, public comments from Googlers, tests and experiments, and first principles, here’s how I think JavaScript indexing is working at Google at the moment: I think there is a separate queue for JS-enabled rendering, because the computational cost of trying to run JavaScript over the entire web is unnecessary given the lack of a need for it on many, many pages. In detail, I think:

  • Googlebot crawls and caches HTML and core resources regularly
  • Heuristics (and probably machine learning) are used to prioritize JavaScript rendering for each page:
    • Some pages are indexed with no JS execution. There are many pages that can probably be easily identified as not needing rendering, and others which are such a low priority that it isn’t worth the computing resources.
    • Some pages get immediate rendering – or possibly immediate basic/regular indexing, along with high-priority rendering. This would enable the immediate indexation of pages in news results or other QDF results, but also allow pages that rely heavily on JS to get updated indexation when the rendering completes.
    • Many pages are rendered async in a separate process/queue from both crawling and regular indexing, thereby adding the page to the index for new words and phrases found only in the JS-rendered version when rendering completes, in addition to the words and phrases found in the unrendered version indexed initially.
  • The JS rendering also, in addition to adding pages to the index:
    • May make modifications to the link graph
    • May add new URLs to the discovery/crawling queue for Googlebot

The idea of JavaScript rendering as a distinct and separate part of the indexing pipeline is backed up by this quote from KMag, who I mentioned previously for his contributions to this HN thread (direct link) [emphasis mine]:

“I was working on the lightweight high-performance JavaScript interpretation system that sandboxed pretty much just a JS engine and a DOM implementation that we could run on every web page on the index. Most of my work was trying to improve the fidelity of the system. My code analyzed every web page in the index.

Towards the end of my time there, there was someone in Mountain View working on a heavier, higher-fidelity system that sandboxed much more of a browser, and they were trying to improve performance so they could use it on a higher percentage of the index.”

This was the situation in 2010. It seems likely that they have moved a long way towards the headless browser in all cases, but I’m skeptical about whether it would be worth their while to render every page they crawl with JavaScript given the expense of doing so and the fact that a large percentage of pages do not change substantially when you do.

My best guess is that they’re using a combination of trying to figure out the need for JavaScript execution on a given page, coupled with trust/authority metrics to decide whether (and with what priority) to render a page with JS.

Run a test, get publicity

I have a hypothesis that I would love to see someone test: That it’s possible to get a page indexed and ranking for a nonsense word contained in the served HTML, but not initially ranking for a different nonsense word added via JavaScript; then, to see the JS get indexed some period of time later and rank for both nonsense words. If you want to run that test, let me know the results — I’d be happy to publicize them.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Google begins mobile-first indexing, using mobile content for all search rankings

While called an ‘experiment,’ it’s actually the first move in Google’s planned shift to looking primarily at mobile content, rather than desktop, when deciding how to rank results.

The post Google begins mobile-first indexing, using mobile content for all search rankings appeared first on Search…



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

SearchCap: Local Marketing Automation Tools, App Indexing May Change SEO & Mars Google Doodle

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Local Marketing Automation Tools, App Indexing May Change SEO & Mars Google Doodle appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Expedited Indexing For New Content: Google Recommends PubSubHubbub Still

Google has been recommending for years and years to use RSS feeds and PubSubHubbub to help Google index new content quickly, in addition to using XML Sitemaps.

Truth is, I am a bit surprised they still recommend it but they do…


Search Engine Roundtable

Posted in IM NewsComments Off

SearchCap: Google Maps Apology, Matt Cutts Replaced & App Indexing Data

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Google Maps Apology, Matt Cutts Replaced & App Indexing Data appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Google App Indexing Now Supports Eight New Languages

Google announced that their App Indexing protocol, the ability for Google to link search results to your mobile apps direct deep content now supports eight new languages. App Indexing lets developers markup their content so that if a searcher has an Android app installed on their phone, and the…



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Acesse o AntiProteção clicando aqui: http://www.antiprotecao.com.br/ O AntiProteção é uma ferramenta desenvolvida para auxiliar os usuários a baixarem seus arquivos, de modo que não…
Video Rating: 0 / 5

Posted in IM NewsComments Off

SearchCap: Google Right To Be Forgotten Form, Yahoo Search Bar & App Indexing Languages

Below is what happened in search today, as reported on Search Engine Land and from other places across the web. From Search Engine Land: How Google’s New “Right To Be Forgotten” Form Works: An Explainer Google has taken a big step forward in complying with the European…



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Advert