Tag Archive | "Keyword"

Keyword Not Provided, But it Just Clicks

When SEO Was Easy

When I got started on the web over 15 years ago I created an overly broad & shallow website that had little chance of making money because it was utterly undifferentiated and crappy. In spite of my best (worst?) efforts while being a complete newbie, sometimes I would go to the mailbox and see a check for a couple hundred or a couple thousand dollars come in. My old roommate & I went to Coachella & when the trip was over I returned to a bunch of mail to catch up on & realized I had made way more while not working than what I spent on that trip.

What was the secret to a total newbie making decent income by accident?

Horrible spelling.

Back then search engines were not as sophisticated with their spelling correction features & I was one of 3 or 4 people in the search index that misspelled the name of an online casino the same way many searchers did.

The high minded excuse for why I did not scale that would be claiming I knew it was a temporary trick that was somehow beneath me. The more accurate reason would be thinking in part it was a lucky fluke rather than thinking in systems. If I were clever at the time I would have created the misspeller’s guide to online gambling, though I think I was just so excited to make anything from the web that I perhaps lacked the ambition & foresight to scale things back then.

In the decade that followed I had a number of other lucky breaks like that. One time one of the original internet bubble companies that managed to stay around put up a sitewide footer link targeting the concept that one of my sites made decent money from. This was just before the great recession, before Panda existed. The concept they targeted had 3 or 4 ways to describe it. 2 of them were very profitable & if they targeted either of the most profitable versions with that page the targeting would have sort of carried over to both. They would have outranked me if they targeted the correct version, but they didn’t so their mistargeting was a huge win for me.

Search Gets Complex

Search today is much more complex. In the years since those easy-n-cheesy wins, Google has rolled out many updates which aim to feature sought after destination sites while diminishing the sites which rely one “one simple trick” to rank.

Arguably the quality of the search results has improved significantly as search has become more powerful, more feature rich & has layered in more relevancy signals.

Many quality small web publishers have went away due to some combination of increased competition, algorithmic shifts & uncertainty, and reduced monetization as more ad spend was redirected toward Google & Facebook. But the impact as felt by any given publisher is not the impact as felt by the ecosystem as a whole. Many terrible websites have also went away, while some formerly obscure though higher-quality sites rose to prominence.

There was the Vince update in 2009, which boosted the rankings of many branded websites.

Then in 2011 there was Panda as an extension of Vince, which tanked the rankings of many sites that published hundreds of thousands or millions of thin content pages while boosting the rankings of trusted branded destinations.

Then there was Penguin, which was a penalty that hit many websites which had heavily manipulated or otherwise aggressive appearing link profiles. Google felt there was a lot of noise in the link graph, which was their justification for the Penguin.

There were updates which lowered the rankings of many exact match domains. And then increased ad load in the search results along with the other above ranking shifts further lowered the ability to rank keyword-driven domain names. If your domain is generically descriptive then there is a limit to how differentiated & memorable you can make it if you are targeting the core market the keywords are aligned with.

There is a reason eBay is more popular than auction.com, Google is more popular than search.com, Yahoo is more popular than portal.com & Amazon is more popular than a store.com or a shop.com. When that winner take most impact of many online markets is coupled with the move away from using classic relevancy signals the economics shift to where is makes a lot more sense to carry the heavy overhead of establishing a strong brand.

Branded and navigational search queries could be used in the relevancy algorithm stack to confirm the quality of a site & verify (or dispute) the veracity of other signals.

Historically relevant algo shortcuts become less appealing as they become less relevant to the current ecosystem & even less aligned with the future trends of the market. Add in negative incentives for pushing on a string (penalties on top of wasting the capital outlay) and a more holistic approach certainly makes sense.

Modeling Web Users & Modeling Language

PageRank was an attempt to model the random surfer.

When Google is pervasively monitoring most users across the web they can shift to directly measuring their behaviors instead of using indirect signals.

Years ago Bill Slawski wrote about the long click in which he opened by quoting Steven Levy’s In the Plex: How Google Thinks, Works, and Shapes our Lives

“On the most basic level, Google could see how satisfied users were. To paraphrase Tolstoy, happy users were all the same. The best sign of their happiness was the “Long Click” — This occurred when someone went to a search result, ideally the top one, and did not return. That meant Google has successfully fulfilled the query.”

Of course, there’s a patent for that. In Modifying search result ranking based on implicit user feedback they state:

user reactions to particular search results or search result lists may be gauged, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance, so that if they select a particular search result, it is likely to be relevant, or at least more relevant than the presented alternatives.

If you are a known brand you are more likely to get clicked on than a random unknown entity in the same market.

And if you are something people are specifically seeking out, they are likely to stay on your website for an extended period of time.

One aspect of the subject matter described in this specification can be embodied in a computer-implemented method that includes determining a measure of relevance for a document result within a context of a search query for which the document result is returned, the determining being based on a first number in relation to a second number, the first number corresponding to longer views of the document result, and the second number corresponding to at least shorter views of the document result; and outputting the measure of relevance to a ranking engine for ranking of search results, including the document result, for a new search corresponding to the search query. The first number can include a number of the longer views of the document result, the second number can include a total number of views of the document result, and the determining can include dividing the number of longer views by the total number of views.

Attempts to manipulate such data may not work.

safeguards against spammers (users who generate fraudulent clicks in an attempt to boost certain search results) can be taken to help ensure that the user selection data is meaningful, even when very little data is available for a given (rare) query. These safeguards can include employing a user model that describes how a user should behave over time, and if a user doesn’t conform to this model, their click data can be disregarded. The safeguards can be designed to accomplish two main objectives: (1) ensure democracy in the votes (e.g., one single vote per cookie and/or IP for a given query-URL pair), and (2) entirely remove the information coming from cookies or IP addresses that do not look natural in their browsing behavior (e.g., abnormal distribution of click positions, click durations, clicks_per_minute/hour/day, etc.). Suspicious clicks can be removed, and the click signals for queries that appear to be spmed need not be used (e.g., queries for which the clicks feature a distribution of user agents, cookie ages, etc. that do not look normal).

And just like Google can make a matrix of documents & queries, they could also choose to put more weight on search accounts associated with topical expert users based on their historical click patterns.

Moreover, the weighting can be adjusted based on the determined type of the user both in terms of how click duration is translated into good clicks versus not-so-good clicks, and in terms of how much weight to give to the good clicks from a particular user group versus another user group. Some user’s implicit feedback may be more valuable than other users due to the details of a user’s review process. For example, a user that almost always clicks on the highest ranked result can have his good clicks assigned lower weights than a user who more often clicks results lower in the ranking first (since the second user is likely more discriminating in his assessment of what constitutes a good result). In addition, a user can be classified based on his or her query stream. Users that issue many queries on (or related to) a given topic T (e.g., queries related to law) can be presumed to have a high degree of expertise with respect to the given topic T, and their click data can be weighted accordingly for other queries by them on (or related to) the given topic T.

Google was using click data to drive their search rankings as far back as 2009. David Naylor was perhaps the first person who publicly spotted this. Google was ranking Australian websites for [tennis court hire] in the UK & Ireland, in part because that is where most of the click signal came from. That phrase was most widely searched for in Australia. In the years since Google has done a better job of geographically isolating clicks to prevent things like the problem David Naylor noticed, where almost all search results in one geographic region came from a different country.

Whenever SEOs mention using click data to search engineers, the search engineers quickly respond about how they might consider any signal but clicks would be a noisy signal. But if a signal has noise an engineer would work around the noise by finding ways to filter the noise out or combine multiple signals. To this day Google states they are still working to filter noise from the link graph: “We continued to protect the value of authoritative and relevant links as an important ranking signal for Search.”

The site with millions of inbound links, few intentional visits & those who do visit quickly click the back button (due to a heavy ad load, poor user experience, low quality content, shallow content, outdated content, or some other bait-n-switch approach)…that’s an outlier. Preventing those sorts of sites from ranking well would be another way of protecting the value of authoritative & relevant links.

Best Practices Vary Across Time & By Market + Category

Along the way, concurrent with the above sorts of updates, Google also improved their spelling auto-correct features, auto-completed search queries for many years through a featured called Google Instant (though they later undid forced query auto-completion while retaining automated search suggestions), and then they rolled out a few other algorithms that further allowed them to model language & user behavior.

Today it would be much harder to get paid above median wages explicitly for sucking at basic spelling or scaling some other individual shortcut to the moon, like pouring millions of low quality articles into a (formerly!) trusted domain.

Nearly a decade after Panda, eHow’s rankings still haven’t recovered.

Back when I got started with SEO the phrase Indian SEO company was associated with cut-rate work where people were buying exclusively based on price. Sort of like a “I got a $ 500 budget for link building, but can not under any circumstance invest more than $ 5 in any individual link.” Part of how my wife met me was she hired a hack SEO from San Diego who outsourced all the work to India and marked the price up about 100-fold while claiming it was all done in the United States. He created reciprocal links pages that got her site penalized & it didn’t rank until after she took her reciprocal links page down.

With that sort of behavior widespread (hack US firm teaching people working in an emerging market poor practices), it likely meant many SEO “best practices” which were learned in an emerging market (particularly where the web was also underdeveloped) would be more inclined to being spammy. Considering how far ahead many Western markets were on the early Internet & how India has so many languages & how most web usage in India is based on mobile devices where it is hard for users to create links, it only makes sense that Google would want to place more weight on end user data in such a market.

If you set your computer location to India Bing’s search box lists 9 different languages to choose from.

The above is not to state anything derogatory about any emerging market, but rather that various signals are stronger in some markets than others. And competition is stronger in some markets than others.

Search engines can only rank what exists.

“In a lot of Eastern European – but not just Eastern European markets – I think it is an issue for the majority of the [bream? muffled] countries, for the Arabic-speaking world, there just isn’t enough content as compared to the percentage of the Internet population that those regions represent. I don’t have up to date data, I know that a couple years ago we looked at Arabic for example and then the disparity was enormous. so if I’m not mistaken the Arabic speaking population of the world is maybe 5 to 6%, maybe more, correct me if I am wrong. But very definitely the amount of Arabic content in our index is several orders below that. So that means we do not have enough Arabic content to give to our Arabic users even if we wanted to. And you can exploit that amazingly easily and if you create a bit of content in Arabic, whatever it looks like we’re gonna go you know we don’t have anything else to serve this and it ends up being horrible. and people will say you know this works. I keyword stuffed the hell out of this page, bought some links, and there it is number one. There is nothing else to show, so yeah you’re number one. the moment somebody actually goes out and creates high quality content that’s there for the long haul, you’ll be out and that there will be one.” – Andrey Lipattsev – Search Quality Senior Strategist at Google Ireland, on Mar 23, 2016

Impacting the Economics of Publishing

Now search engines can certainly influence the economics of various types of media. At one point some otherwise credible media outlets were pitching the Demand Media IPO narrative that Demand Media was the publisher of the future & what other media outlets will look like. Years later, after heavily squeezing on the partner network & promoting programmatic advertising that reduces CPMs by the day Google is funding partnerships with multiple news publishers like McClatchy & Gatehouse to try to revive the news dead zones even Facebook is struggling with.

“Facebook Inc. has been looking to boost its local-news offerings since a 2017 survey showed most of its users were clamoring for more. It has run into a problem: There simply isn’t enough local news in vast swaths of the country. … more than one in five newspapers have closed in the past decade and a half, leaving half the counties in the nation with just one newspaper, and 200 counties with no newspaper at all.”

As mainstream newspapers continue laying off journalists, Facebook’s news efforts are likely to continue failing unless they include direct economic incentives, as Google’s programmatic ad push broke the banner ad:

“Thanks to the convoluted machinery of Internet advertising, the advertising world went from being about content publishers and advertising context—The Times unilaterally declaring, via its ‘rate card’, that ads in the Times Style section cost $ 30 per thousand impressions—to the users themselves and the data that targets them—Zappo’s saying it wants to show this specific shoe ad to this specific user (or type of user), regardless of publisher context. Flipping the script from a historically publisher-controlled mediascape to an advertiser (and advertiser intermediary) controlled one was really Google’s doing. Facebook merely rode the now-cresting wave, borrowing outside media’s content via its own users’ sharing, while undermining media’s ability to monetize via Facebook’s own user-data-centric advertising machinery. Conventional media lost both distribution and monetization at once, a mortal blow.”

Google is offering news publishers audience development & business development tools.

Heavy Investment in Emerging Markets Quickly Evolves the Markets

As the web grows rapidly in India, they’ll have a thousand flowers bloom. In 5 years the competition in India & other emerging markets will be much tougher as those markets continue to grow rapidly. Media is much cheaper to produce in India than it is in the United States. Labor costs are lower & they never had the economic albatross that is the ACA adversely impact their economy. At some point the level of investment & increased competition will mean early techniques stop having as much efficacy. Chinese companies are aggressively investing in India.

“If you break India into a pyramid, the top 100 million (urban) consumers who think and behave more like Americans are well-served,” says Amit Jangir, who leads India investments at 01VC, a Chinese venture capital firm based in Shanghai. The early stage venture firm has invested in micro-lending firms FlashCash and SmartCoin based in India. The new target is the next 200 million to 600 million consumers, who do not have a go-to entertainment, payment or ecommerce platform yet— and there is gonna be a unicorn in each of these verticals, says Jangir, adding that it will be not be as easy for a player to win this market considering the diversity and low ticket sizes.


RankBrain appears to be based on using user clickpaths on head keywords to help bleed rankings across into related searches which are searched less frequently. A Googler didn’t state this specifically, but it is how they would be able to use models of searcher behavior to refine search results for keywords which are rarely searched for.

In a recent interview in Scientific American a Google engineer stated: “By design, search engines have learned to associate short queries with the targets of those searches by tracking pages that are visited as a result of the query, making the results returned both faster and more accurate than they otherwise would have been.”

Now a person might go out and try to search for something a bunch of times or pay other people to search for a topic and click a specific listing, but some of the related Google patents on using click data (which keep getting updated) mentioned how they can discount or turn off the signal if there is an unnatural spike of traffic on a specific keyword, or if there is an unnatural spike of traffic heading to a particular website or web page.

And, since Google is tracking the behavior of end users on their own website, anomalous behavior is easier to track than it is tracking something across the broader web where signals are more indirect. Google can take advantage of their wide distribution of Chrome & Android where users are regularly logged into Google & pervasively tracked to place more weight on users where they had credit card data, a long account history with regular normal search behavior, heavy Gmail users, etc.

Plus there is a huge gap between the cost of traffic & the ability to monetize it. You might have to pay someone a dime or a quarter to search for something & there is no guarantee it will work on a sustainable basis even if you paid hundreds or thousands of people to do it. Any of those experimental searchers will have no lasting value unless they influence rank, but even if they do influence rankings it might only last temporarily. If you bought a bunch of traffic into something genuine Google searchers didn’t like then even if it started to rank better temporarily the rankings would quickly fall back if the real end user searchers disliked the site relative to other sites which already rank.

This is part of the reason why so many SEO blogs mention brand, brand, brand. If people are specifically looking for you in volume & Google can see that thousands or millions of people specifically want to access your site then that can impact how you rank elsewhere.

Even looking at something inside the search results for a while (dwell time) or quickly skipping over it to have a deeper scroll depth can be a ranking signal. Some Google patents mention how they can use mouse pointer location on desktop or scroll data from the viewport on mobile devices as a quality signal.

Neural Matching

Last year Danny Sullivan mentioned how Google rolled out neural matching to better understand the intent behind a search query.

The above Tweets capture what the neural matching technology intends to do. Google also stated:

we’ve now reached the point where neural networks can help us take a major leap forward from understanding words to understanding concepts. Neural embeddings, an approach developed in the field of neural networks, allow us to transform words to fuzzier representations of the underlying concepts, and then match the concepts in the query with the concepts in the document. We call this technique neural matching.

To help people understand the difference between neural matching & RankBrain, Google told SEL: “RankBrain helps Google better relate pages to concepts. Neural matching helps Google better relate words to searches.”

There are a couple research papers on neural matching.

The first one was titled A Deep Relevance Matching Model for Ad-hoc Retrieval. It mentioned using Word2vec & here are a few quotes from the research paper

  • “Successful relevance matching requires proper handling of the exact matching signals, query term importance, and diverse matching requirements.”
  • “the interaction-focused model, which first builds local level interactions (i.e., local matching signals) between two pieces of text, and then uses deep neural networks to learn hierarchical interaction patterns for matching.”
  • “according to the diverse matching requirement, relevance matching is not position related since it could happen in any position in a long document.”
  • “Most NLP tasks concern semantic matching, i.e., identifying the semantic meaning and infer”ring the semantic relations between two pieces of text, while the ad-hoc retrieval task is mainly about relevance matching, i.e., identifying whether a document is relevant to a given query.”
  • “Since the ad-hoc retrieval task is fundamentally a ranking problem, we employ a pairwise ranking loss such as hinge loss to train our deep relevance matching model.”

The paper mentions how semantic matching falls down when compared against relevancy matching because:

  • semantic matching relies on similarity matching signals (some words or phrases with the same meaning might be semantically distant), compositional meanings (matching sentences more than meaning) & a global matching requirement (comparing things in their entirety instead of looking at the best matching part of a longer document); whereas,
  • relevance matching can put significant weight on exact matching signals (weighting an exact match higher than a near match), adjust weighting on query term importance (one word might or phrase in a search query might have a far higher discrimination value & might deserve far more weight than the next) & leverage diverse matching requirements (allowing relevancy matching to happen in any part of a longer document)

Here are a couple images from the above research paper

And then the second research paper is

Deep Relevancy Ranking Using Enhanced Dcoument-Query Interactions
“interaction-based models are less efficient, since one cannot index a document representation independently of the query. This is less important, though, when relevancy ranking methods rerank the top documents returned by a conventional IR engine, which is the scenario we consider here.”

That same sort of re-ranking concept is being better understood across the industry. There are ranking signals that earn some base level ranking, and then results get re-ranked based on other factors like how well a result matches the user intent.

Here are a couple images from the above research paper.

For those who hate the idea of reading research papers or patent applications, Martinibuster also wrote about the technology here. About the only part of his post I would debate is this one:

“Does this mean publishers should use more synonyms? Adding synonyms has always seemed to me to be a variation of keyword spamming. I have always considered it a naive suggestion. The purpose of Google understanding synonyms is simply to understand the context and meaning of a page. Communicating clearly and consistently is, in my opinion, more important than spamming a page with keywords and synonyms.”

I think one should always consider user experience over other factors, however a person could still use variations throughout the copy & pick up a bit more traffic without coming across as spammy. Danny Sullivan mentioned the super synonym concept was impacting 30% of search queries, so there are still a lot which may only be available to those who use a specific phrase on their page.

Martinibuster also wrote another blog post tying more research papers & patents to the above. You could probably spend a month reading all the related patents & research papers.

The above sort of language modeling & end user click feedback compliment links-based ranking signals in a way that makes it much harder to luck one’s way into any form of success by being a terrible speller or just bombing away at link manipulation without much concern toward any other aspect of the user experience or market you operate in.

Pre-penalized Shortcuts

Google was even issued a patent for predicting site quality based upon the N-grams used on the site & comparing those against the N-grams used on other established site where quality has already been scored via other methods: “The phrase model can be used to predict a site quality score for a new site; in particular, this can be done in the absence of other information. The goal is to predict a score that is comparable to the baseline site quality scores of the previously-scored sites.”

Have you considered using a PLR package to generate the shell of your site’s content? Good luck with that as some sites trying that shortcut might be pre-penalized from birth.

Navigating the Maze

When I started in SEO one of my friends had a dad who is vastly smarter than I am. He advised me that Google engineers were smarter, had more capital, had more exposure, had more data, etc etc etc … and thus SEO was ultimately going to be a malinvestment.

Back then he was at least partially wrong because influencing search was so easy.

But in the current market, 16 years later, we are near the infection point where he would finally be right.

At some point the shortcuts stop working & it makes sense to try a different approach.

The flip side of all the above changes is as the algorithms have become more complex they have went from being a headwind to people ignorant about SEO to being a tailwind to those who do not focus excessively on SEO in isolation.

If one is a dominant voice in a particular market, if they break industry news, if they have key exclusives, if they spot & name the industry trends, if their site becomes a must read & is what amounts to a habit … then they perhaps become viewed as an entity. Entity-related signals help them & those signals that are working against the people who might have lucked into a bit of success become a tailwind rather than a headwind.

If your work defines your industry, then any efforts to model entities, user behavior or the language of your industry are going to boost your work on a relative basis.

This requires sites to publish frequently enough to be a habit, or publish highly differentiated content which is strong enough that it is worth the wait.

Those which publish frequently without being particularly differentiated are almost guaranteed to eventually walk into a penalty of some sort. And each additional person who reads marginal, undifferentiated content (particularly if it has an ad-heavy layout) is one additional visitor that site is closer to eventually getting whacked. Success becomes self regulating. Any short-term success becomes self defeating if one has a highly opportunistic short-term focus.

Those who write content that only they could write are more likely to have sustained success.

SEO Book

Posted in IM NewsComments Off

The One-Hour Guide to SEO, Part 2: Keyword Research – Whiteboard Friday

Posted by randfish

Before doing any SEO work, it’s important to get a handle on your keyword research. Aside from helping to inform your strategy and structure your content, you’ll get to know the needs of your searchers, the search demand landscape of the SERPs, and what kind of competition you’re up against.

In the second part of the One-Hour Guide to SEO, the inimitable Rand Fishkin covers what you need to know about the keyword research process, from understanding its goals to building your own keyword universe map. Enjoy!

Click on the whiteboard image above to open a high resolution version in a new tab!

Video Transcription

Howdy, Moz fans. Welcome to another portion of our special edition of Whiteboard Friday, the One-Hour Guide to SEO. This is Part II – Keyword Research. Hopefully you’ve already seen our SEO strategy session from last week. What we want to do in keyword research is talk about why keyword research is required. Why do I have to do this task prior to doing any SEO work?

The answer is fairly simple. If you don’t know which words and phrases people type into Google or YouTube or Amazon or Bing, whatever search engine you’re optimizing for, you’re not going to be able to know how to structure your content. You won’t be able to get into the searcher’s brain, into their head to imagine and empathize with them what they actually want from your content. You probably won’t do correct targeting, which will mean your competitors, who are doing keyword research, are choosing wise search phrases, wise words and terms and phrases that searchers are actually looking for, and you might be unfortunately optimizing for words and phrases that no one is actually looking for or not as many people are looking for or that are much more difficult than what you can actually rank for.

The goals of keyword research

So let’s talk about some of the big-picture goals of keyword research. 

Understand the search demand landscape so you can craft more optimal SEO strategies

First off, we are trying to understand the search demand landscape so we can craft better SEO strategies. Let me just paint a picture for you.

I was helping a startup here in Seattle, Washington, a number of years ago — this was probably a couple of years ago — called Crowd Cow. Crowd Cow is an awesome company. They basically will deliver beef from small ranchers and small farms straight to your doorstep. I personally am a big fan of steak, and I don’t really love the quality of the stuff that I can get from the store. I don’t love the mass-produced sort of industry around beef. I think there are a lot of Americans who feel that way. So working with small ranchers directly, where they’re sending it straight from their farms, is kind of an awesome thing.

But when we looked at the SEO picture for Crowd Cow, for this company, what we saw was that there was more search demand for competitors of theirs, people like Omaha Steaks, which you might have heard of. There was more search demand for them than there was for “buy steak online,” “buy beef online,” and “buy rib eye online.” Even things like just “shop for steak” or “steak online,” these broad keyword phrases, the branded terms of their competition had more search demand than all of the specific keywords, the unbranded generic keywords put together.

That is a very different picture from a world like “soccer jerseys,” where I spent a little bit of keyword research time today looking, and basically the brand names in that field do not have nearly as much search volume as the generic terms for soccer jerseys and custom soccer jerseys and football clubs’ particular jerseys. Those generic terms have much more volume, which is a totally different kind of SEO that you’re doing. One is very, “Oh, we need to build our brand. We need to go out into this marketplace and create demand.” The other one is, “Hey, we need to serve existing demand already.”

So you’ve got to understand your search demand landscape so that you can present to your executive team and your marketing team or your client or whoever it is, hey, this is what the search demand landscape looks like, and here’s what we can actually do for you. Here’s how much demand there is. Here’s what we can serve today versus we need to grow our brand.

Create a list of terms and phrases that match your marketing goals and are achievable in rankings

The next goal of keyword research, we want to create a list of terms and phrases that we can then use to match our marketing goals and achieve rankings. We want to make sure that the rankings that we promise, the keywords that we say we’re going to try and rank for actually have real demand and we can actually optimize for them and potentially rank for them. Or in the case where that’s not true, they’re too difficult or they’re too hard to rank for. Or organic results don’t really show up in those types of searches, and we should go after paid or maps or images or videos or some other type of search result.

Prioritize keyword investments so you do the most important, high-ROI work first

We also want to prioritize those keyword investments so we’re doing the most important work, the highest ROI work in our SEO universe first. There’s no point spending hours and months going after a bunch of keywords that if we had just chosen these other ones, we could have achieved much better results in a shorter period of time.

Match keywords to pages on your site to find the gaps

Finally, we want to take all the keywords that matter to us and match them to the pages on our site. If we don’t have matches, we need to create that content. If we do have matches but they are suboptimal, not doing a great job of answering that searcher’s query, well, we need to do that work as well. If we have a page that matches but we haven’t done our keyword optimization, which we’ll talk a little bit more about in a future video, we’ve got to do that too.

Understand the different varieties of search results

So an important part of understanding how search engines work — we’re going to start down here and then we’ll come back up — is to have this understanding that when you perform a query on a mobile device or a desktop device, Google shows you a vast variety of results. Ten or fifteen years ago this was not the case. We searched 15 years ago for “soccer jerseys,” what did we get? Ten blue links. I think, unfortunately, in the minds of many search marketers and many people who are unfamiliar with SEO, they still think of it that way. How do I rank number one? The answer is, well, there are a lot of things “number one” can mean today, and we need to be careful about what we’re optimizing for.

So if I search for “soccer jersey,” I get these shopping results from Macy’s and soccer.com and all these other places. Google sort has this sliding box of sponsored shopping results. Then they’ve got advertisements below that, notated with this tiny green ad box. Then below that, there are couple of organic results, what we would call classic SEO, 10 blue links-style organic results. There are two of those. Then there’s a box of maps results that show me local soccer stores in my region, which is a totally different kind of optimization, local SEO. So you need to make sure that you understand and that you can convey that understanding to everyone on your team that these different kinds of results mean different types of SEO.

Now I’ve done some work recently over the last few years with a company called Jumpshot. They collect clickstream data from millions of browsers around the world and millions of browsers here in the United States. So they are able to provide some broad overview numbers collectively across the billions of searches that are performed on Google every day in the United States.

Click-through rates differ between mobile and desktop

The click-through rates look something like this. For mobile devices, on average, paid results get 8.7% of all clicks, organic results get about 40%, a little under 40% of all clicks, and zero-click searches, where a searcher performs a query but doesn’t click anything, Google essentially either answers the results in there or the searcher is so unhappy with the potential results that they don’t bother taking anything, that is 62%. So the vast majority of searches on mobile are no-click searches.

On desktop, it’s a very different story. It’s sort of inverted. So paid is 5.6%. I think people are a little savvier about which result they should be clicking on desktop. Organic is 65%, so much, much higher than mobile. Zero-click searches is 34%, so considerably lower.

There are a lot more clicks happening on a desktop device. That being said, right now we think it’s around 60–40, meaning 60% of queries on Google, at least, happen on mobile and 40% happen on desktop, somewhere in those ranges. It might be a little higher or a little lower.

The search demand curve

Another important and critical thing to understand about the keyword research universe and how we do keyword research is that there’s a sort of search demand curve. So for any given universe of keywords, there is essentially a small number, maybe a few to a few dozen keywords that have millions or hundreds of thousands of searches every month. Something like “soccer” or “Seattle Sounders,” those have tens or hundreds of thousands, even millions of searches every month in the United States.

But people searching for “Sounders FC away jersey customizable,” there are very, very few searches per month, but there are millions, even billions of keywords like this. 

The long-tail: millions of keyword terms and phrases, low number of monthly searches

When Sundar Pichai, Google’s current CEO, was testifying before Congress just a few months ago, he told Congress that around 20% of all searches that Google receives each day they have never seen before. No one has ever performed them in the history of the search engines. I think maybe that number is closer to 18%. But that is just a remarkable sum, and it tells you about what we call the long tail of search demand, essentially tons and tons of keywords, millions or billions of keywords that are only searched for 1 time per month, 5 times per month, 10 times per month.

The chunky middle: thousands or tens of thousands of keywords with ~50–100 searches per month

If you want to get into this next layer, what we call the chunky middle in the SEO world, this is where there are thousands or tens of thousands of keywords potentially in your universe, but they only have between say 50 and a few hundred searches per month.

The fat head: a very few keywords with hundreds of thousands or millions of searches

Then this fat head has only a few keywords. There’s only one keyword like “soccer” or “soccer jersey,” which is actually probably more like the chunky middle, but it has hundreds of thousands or millions of searches. The fat head is higher competition and broader intent.

Searcher intent and keyword competition

What do I mean by broader intent? That means when someone performs a search for “soccer,” you don’t know what they’re looking for. The likelihood that they want a customizable soccer jersey right that moment is very, very small. They’re probably looking for something much broader, and it’s hard to know exactly their intent.

However, as you drift down into the chunky middle and into the long tail, where there are more keywords but fewer searches for each keyword, your competition gets much lower. There are fewer people trying to compete and rank for those, because they don’t know to optimize for them, and there’s more specific intent. “Customizable Sounders FC away jersey” is very clear. I know exactly what I want. I want to order a customizable jersey from the Seattle Sounders away, the particular colors that the away jersey has, and I want to be able to put my logo on there or my name on the back of it, what have you. So super specific intent.

Build a map of your own keyword universe

As a result, you need to figure out what the map of your universe looks like so that you can present that, and you need to be able to build a list that looks something like this. You should at the end of the keyword research process — we featured a screenshot from Moz’s Keyword Explorer, which is a tool that I really like to use and I find super helpful whenever I’m helping companies, even now that I have left Moz and been gone for a year, I still sort of use Keyword Explorer because the volume data is so good and it puts all the stuff together. However, there are two or three other tools that a lot of people like, one from Ahrefs, which I think also has the name Keyword Explorer, and one from SEMrush, which I like although some of the volume numbers, at least in the United States, are not as good as what I might hope for. There are a number of other tools that you could check out as well. A lot of people like Google Trends, which is totally free and interesting for some of that broad volume data.

So I might have terms like “soccer jersey,” “Sounders FC jersey”, and “custom soccer jersey Seattle Sounders.” Then I’ll have these columns: 

  • Volume, because I want to know how many people search for it; 
  • Difficulty, how hard will it be to rank. If it’s super difficult to rank and I have a brand-new website and I don’t have a lot of authority, well, maybe I should target some of these other ones first that are lower difficulty. 
  • Organic Click-through Rate, just like we talked about back here, there are different levels of click-through rate, and the tools, at least Moz’s Keyword Explorer tool uses Jumpshot data on a per keyword basis to estimate what percent of people are going to click the organic results. Should you optimize for it? Well, if the click-through rate is only 60%, pretend that instead of 100 searches, this only has 60 or 60 available searches for your organic clicks. Ninety-five percent, though, great, awesome. All four of those monthly searches are available to you.
  • Business Value, how useful is this to your business? 
  • Then set some type of priority to determine. So I might look at this list and say, “Hey, for my new soccer jersey website, this is the most important keyword. I want to go after “custom soccer jersey” for each team in the U.S., and then I’ll go after team jersey, and then I’ll go after “customizable away jerseys.” Then maybe I’ll go after “soccer jerseys,” because it’s just so competitive and so difficult to rank for. There’s a lot of volume, but the search intent is not as great. The business value to me is not as good, all those kinds of things.
  • Last, but not least, I want to know the types of searches that appear — organic, paid. Do images show up? Does shopping show up? Does video show up? Do maps results show up? If those other types of search results, like we talked about here, show up in there, I can do SEO to appear in those places too. That could yield, in certain keyword universes, a strategy that is very image centric or very video centric, which means I’ve got to do a lot of work on YouTube, or very map centric, which means I’ve got to do a lot of local SEO, or other kinds like this.

Once you build a keyword research list like this, you can begin the prioritization process and the true work of creating pages, mapping the pages you already have to the keywords that you’ve got, and optimizing in order to rank. We’ll talk about that in Part III next week. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

The Basics of Building an Intent-Based Keyword List

Posted by TheMozTeam

This post was originally published on the STAT blog.

In this article, we’re taking a deep dive into search intent.

It’s a topic we’ve covered before with some depth. This STAT whitepaper looked at how SERP features respond to intent, and a few bonus blog posts broke things down even further and examined how individual intent modifiers impact SERP features, the kind of content that Google serves at each stage of intent, and how you can set up your very own search intent projects. (And look out for Seer’s very own Scott Taft’s upcoming post this week on how to use STAT and Power BI to create your very own search intent dashboard.)

Search intent is the new demographics, so it only made sense to get up close and personal with it. Of course, in order to bag all those juicy search intent tidbits, we needed a great intent-based keyword list. Here’s how you can get your hands on one of those.

Gather your core keywords

First, before you can even think about intent, you need to have a solid foundation of core keywords in place. These are the products, features, and/or services that you’ll build your search intent funnel around.

But goodness knows that keyword list-building is more of an art than a science, and even the greatest writers (hi, Homer) needed to invoke the muses (hey, Calliope) for inspiration, so if staring at your website isn’t getting the creative juices flowing, you can look to a few different places for help.

Snag some good suggestions from keyword research tools

Lots of folks like to use the Google Keyword Planner to help them get started. Ubersuggest and Yoast’s Google Suggest Expander will also help add keywords to your arsenal. And Answer The Public gives you all of that, and beautifully visualized to boot.

Simply plunk in a keyword and watch the suggestions pour in. Just remember to be critical of these auto-generated lists, as odd choices sometimes slip into the mix. For example, apparently we should add [free phones] to our list of [rank tracking] keywords. Huh.

Spot inspiration on the SERPs

Two straight-from-the-SERP resources that we love for keyword research are the “People also ask” box and related searches. These queries are Google-vetted and plentiful, and also give you some insight into how the search engine giant links topics.

If you’re a STAT client, you can generate reports that will give you every question in a PAA box (before it gets infinite), as well as each of the eight related searches at the bottom of a SERP. Run the reports for a couple of days and you’ll get a quick sense of which questions and queries Google favours for your existing keyword set.

A quick note about language & location

When you’re in the UK, you push a pram, not a stroller; you don’t wear a sweater, you wear a jumper. This is all to say that if you’re in the business of global tracking, it’s important to keep different countries’ word choices in mind. Even if you’re not creating content with them, it’s good to see if you’re appearing for the terms your global searchers are using.

Add your intent modifiers

Now it’s time to tackle the intent bit of your keyword list. And this bit is going to require drawing some lines in the sand because the modifiers that occupy each intent category can be highly subjective — does “best” apply transactional intent instead of commercial?

We’ve put together a loose guideline below, but the bottom line is that intent should be structured and classified in a way that makes sense to your business. And if you’re stuck for modifiers to marry to your core keywords, here’s a list of 50+ to help with the coupling.

Informational intent

The searcher has identified a need and is looking for the best solution. These keywords are the core keywords from your earlier hard work, plus every question you think your searchers might have if they’re unfamiliar with your product or services.

Your informational queries might look something like:

  • [product name]
  • what is [product name]
  • how does [product name] work
  • how do I use [product name]

Commercial intent

At this stage, the searcher has zeroed in on a solution and is looking into all the different options available to them. They’re doing comparative research and are interested in specific requirements and features.

For our research, we used best, compare, deals, new, online, refurbished, reviews, shop, top, and used.

Your commercial queries might look something like:

  • best [product name]
  • [product name] reviews
  • compare [product name]
  • what is the top [product name]
  • [colour/style/size] [product name]

Transactional intent (including local and navigational intent)

Transactional queries are the most likely to convert and generally include terms that revolve around price, brand, and location, which is why navigational and local intent are nestled within this stage of the intent funnel.

For our research, we used affordable, buy, cheap, cost, coupon, free shipping, and price.

Your transactional queries might look something like:

  • how much does [product name] cost
  • [product name] in [location]
  • order [product name] online
  • [product name] near me
  • affordable [brand name] [product name]

A tip if you want to speed things up

A super quick way to add modifiers to your keywords and save your typing fingers is by using a keyword mixer like this one. Just don’t forget that using computer programs for human-speak means you’ll have to give them the ol’ once-over to make sure they still make sense.

Audit your list

Now that you’ve reached for the stars and got yourself a huge list of keywords, it’s time to bring things back down to reality and see which ones you’ll actually want to keep around.

No two audits are going to look the same, but here are a few considerations you’ll want to keep in mind when whittling your keywords down to the best of the bunch.

  1. Relevance. Are your keywords represented on your site? Do they point to optimized pages
  2. Search volume. Are you after highly searched terms or looking to build an audience? You can get the SV goods from the Google Keyword Planner.
  3. Opportunity. How many clicks and impressions are your keywords raking in? While not comprehensive (thanks, Not Provided), you can gather some of this info by digging into Google Search Console.
  4. Competition. What other websites are ranking for your keywords? Are you up against SERP monsters like Amazon? What about paid advertising like shopping boxes? How much SERP space are they taking up? Your friendly SERP analytics platform withshare of voice capabilities (hi!) can help you understand your search landscape.
  5. Difficulty. How easy is your keyword going to be to win? Search volume can give you a rough idea — the higher the search volume, the stiffer the competition is likely to be — but for a different approach, Moz’s Keyword Explorer has a Difficulty score that takes Page Authority, Domain Authority, and projected click-through-rate into account.

By now, you should have a pretty solid plan of attack to create an intent-based keyword list of your very own to love, nurture, and cherish.

If, before you jump headlong into it, you’re curious what a good chunk of this is going to looks like in practice, give this excellent article by Russ Jones a read, or drop us a line. We’re always keen to show folks why tracking keywords at scale is the best way to uncover intent-based insights.

Read on, readers!

More in our search intent series:

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

SearchCap: Quora keyword targets, Angular SEO & old stock SEO

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

Please visit Search Engine Land for the full article.

Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Evolving Keyword Research to Match Your Buyer’s Journey

Posted by matthew_jkay

Keyword research has been around as long as the SEO industry has. Search engines built a system that revolves around users entering a term or query into a text entry field, hitting return, and receiving a list of relevant results. As the online search market expanded, one clear leader emerged — Google — and with it they brought AdWords (now Google Ads), an advertising platform that allowed organizations to appear on search results pages for keywords that organically they might not.

Within Google Ads came a tool that enabled businesses to look at how many searches there were per month for almost any query. Google Keyword Planner became the de facto tool for keyword research in the industry, and with good reason: it was Google’s data. Not only that, Google gave us the ability to gather further insights due to other metrics Keyword Planner provided: competition and suggested bid. Whilst these keywords were Google Ads-oriented metrics, they gave the SEO industry an indication of how competitive a keyword was.

The reason is obvious. If a keyword or phrase has higher competition (i.e. more advertisers bidding to appear for that term) it’s likely to be more competitive from an organic perspective. Similarly, a term that has a higher suggested bid means it’s more likely to be a competitive term. SEOs dined on this data for years, but when the industry started digging a bit more into the data, we soon realized that while useful, it was not always wholly accurate. Moz, SEMrush, and other tools all started to develop alternative volume and competitive metrics using Clickstream data to give marketers more insights.

Now industry professionals have several software tools and data outlets to conduct their keyword research. These software companies will only improve in the accuracy of their data outputs. Google’s data is unlikely to significantly change; their goal is to sell ad space, not make life easy for SEOs. In fact, they’ve made life harder by using volume ranges for Google Ads accounts with low activity. SEO tools have investors and customers to appease and must continually improve their products to reduce churn and grow their customer base. This makes things rosy for content-led SEO, right?

Well, not really.

The problem with historical keyword research is twofold:

1. SEOs spend too much time thinking about the decision stage of the buyer’s journey (more on that later).

2. SEOs spend too much time thinking about keywords, rather than categories or topics.

The industry, to its credit, is doing a lot to tackle issue number two. “Topics over keywords” is something that is not new as I’ll briefly come to later. Frameworks for topic-based SEO have started to appear over the last few years. This is a step in the right direction. Organizing site content into categories, adding appropriate internal linking, and understanding that one piece of content can rank for several variations of a phrase is becoming far more commonplace.

What is less well known (but starting to gain traction) is point one. But in order to understand this further, we should dive into what the buyer’s journey actually is.

What is the buyer’s journey?

The buyer’s or customer’s journey is not new. If you open marketing text books from years gone by, get a college degree in marketing, or even just go on general marketing blogs you’ll see it crop up. There are lots of variations of this journey, but they all say a similar thing. No matter what product or service is bought, everyone goes through this journey. This could be online or offline — the main difference is that depending on the product, person, or situation, the amount of time this journey takes will vary — but every buyer goes through it. But what is it, exactly? For the purpose of this article, we’ll focus on three stages: awareness, consideration, & decision.


The awareness stage of the buyer’s journey is similar to problem discovery, where a potential customer realizes that they have a problem (or an opportunity) but they may not have figured out exactly what that is yet.

Search terms at this stage are often question-based — users are researching around a particular area.


The consideration stage is where a potential consumer has defined what their problem or opportunity is and has begun to look for potential solutions to help solve the issue they face.


The decision stage is where most organizations focus their attention. Normally consumers are ready to buy at this stage and are often doing product or vendor comparisons, looking at reviews, and searching for pricing information.

To illustrate this process, let’s take two examples: buying an ice cream and buying a holiday.

Being low-value, the former is not a particularly considered purchase, but this journey still takes place. The latter is more considered. It can often take several weeks or months for a consumer to decide on what destination they want to visit, let alone a hotel or excursions. But how does this affect keyword research, and the content which we as marketers should provide?

At each stage, a buyer will have a different thought process. It’s key to note that not every buyer of the same product will have the same thought process but you can see how we can start to formulate a process.

The Buyer’s Journey – Holiday Purchase

The above table illustrates the sort of queries or terms that consumers might use at different stages of their journey. The problem is that most organizations focus all of their efforts on the decision end of the spectrum. This is entirely the right approach to take at the start because you’re targeting consumers who are interested in your product or service then and there. However, in an increasingly competitive online space you should try and find ways to diversify and bring people into your marketing funnel (which in most cases is your website) at different stages.

I agree with the argument that creating content for people earlier in the journey will likely mean lower conversion rates from visitor to customer, but my counter to this would be that you’re also potentially missing out on people who will become customers. Further possibilities to at least get these people into your funnel include offering content downloads (gated content) to capture user’s information, or remarketing activity via Facebook, Google Ads, or other retargeting platforms.

Moving from keywords to topics

I’m not going to bang this drum too loudly. I think many in of the SEO community have signed up to the approach that topics are more important than keywords. There are quite a few resources on this listed online, but what forced it home for me was Cyrus Shepard’s Moz article in 2014. Much, if not all, of that post still holds true today.

What I will cover is an adoption of HubSpot’s Topic Cluster model. For those unaccustomed to their model, HubSpot’s approach formalizes and labels what many search marketers have been doing for a while now. The basic premise is instead of having your site fragmented with lots of content across multiple sections, all hyperlinking to each other, you create one really in-depth content piece that covers a topic area broadly (and covers shorter-tail keywords with high search volume), and then supplement this page with content targeting the long-tail, such as blog posts, FAQs, or opinion pieces. HubSpot calls this “pillar” and “cluster” content respectively.

Source: Matt Barby / HubSpot

The process then involves taking these cluster pages and linking back to the pillar page using keyword-rich anchor text. There’s nothing particularly new about this approach aside from formalizing it a bit more. Instead of having your site’s content structured in such a way that it’s fragmented and interlinking between lots of different pages and topics, you keep the internal linking within its topic, or content cluster. This video explains this methodology further. While we accept this model may not fit every situation, and nor is it completely perfect, it’s a great way of understanding how search engines are now interpreting content.

At Aira, we’ve taken this approach and tried to evolve it a bit further, tying these topics into the stages of the buyer’s journey while utilizing several data points to make sure our outputs are based off as much data as we can get our hands on. Furthermore, because pillar pages tend to target shorter-tail keywords with high search volume, they’re often either awareness- or consideration-stage content, and thus not applicable for decision stage. We term our key decision pages “target pages,” as this should be a primary focus of any activity we conduct.

We’ll also look at the semantic relativity of the keywords reviewed, so that we have a “parent” keyword that we’re targeting a page to rank for, and then children of that keyword or phrase that the page may also rank for, due to its similarity to the parent. Every keyword is categorized according to its stage in the buyer’s journey and whether it’s appropriate for a pillar, target, or cluster page. We also add two further classifications to our keywords: track & monitor and ignore. Definitions for these five keyword types are listed below:

Pillar page

A pillar page covers all aspects of a topic on a single page, with room for more in-depth reporting in more detailed cluster blog posts that hyperlink back to the pillar page. A keyword tagged with pillar page will be the primary topic and the focus of a page on the website. Pillar pages should be awareness- or consideration-stage content.

A great pillar page example I often refer to is HubSpot’s Facebook marketing guide or Mosi-guard’s insect bites guide (disclaimer: probably don’t click through if you don’t like close-up shots of insects!).

Cluster page

A cluster topic page for the pillar focuses on providing more detail for a specific long-tail keyword related to the main topic. This type of page is normally associated with a blog article but could be another type of content, like an FAQ page.

Good examples within the Facebook marketing topic listed above are HubSpot’s posts:

For Mosi-guard, they’re not utilizing internal links within the copy of the other blogs, but the “older posts” section at the bottom of the blog is referencing this guide:

Target page

Normally a keyword or phrase linked to a product or service page, e.g. nike trainers or seo services. Target pages are decision-stage content pieces.

HubSpot’s target content is their social media software page, with one of Mosi-guard’s target pages being their natural spray product.

Track & monitor

A keyword or phrase that is not the main focus of a page, but could still rank due to its similarity to the target page keyword. A good example of this might be seo services as the target page keyword, but this page could also rank for seo agency, seo company, etc.


A keyword or phrase that has been reviewed but is not recommended to be optimized for, possibly due to a lack of search volume, it’s too competitive, it won’t be profitable, etc.

Once the keyword research is complete, we then map our keywords to existing website pages. This gives us a list of mapped keywords and a list of unmapped keywords, which in turn creates a content gap analysis that often leads to a content plan that could last for three, six, or twelve-plus months.

Putting it into practice

I’m a firm believer in giving an example of how this would work in practice, so I’m going to walk through one with screenshots. I’ll also provide a template of our keyword research document for you to take away.

1. Harvesting keywords

The first step in the process is similar, if not identical, to every other keyword research project. You start off with a batch of keywords from the client or other stakeholders that the site wants to rank for. Most of the industry call this a seed keyword list. That keyword list is normally a minimum of 15–20 keywords, but can often be more if you’re dealing with an e-commerce website with multiple product lines.

This list is often based off nothing more than opinion: “What do we think our potential customers will search for?” It’s a good starting point, but you need the rest of the process to follow on to make sure you’re optimizing based off data, not opinion.

2. Expanding the list

Once you’ve got that keyword list, it’s time to start utilizing some of the tools you have at your disposal. There are lots, of course! We tend to use a combination of Moz Keyword Explorer, Answer the Public, Keywords Everywhere, Google Search Console, Google Analytics, Google Ads, ranking tools, and SEMrush.

The idea of this list is to start thinking about keywords that the organization may not have considered before. Your expanded list will include obvious synonyms from your list. Take the example below:

Seed Keywords

Expanded Keywords

ski chalet

ski chalet

ski chalet rental

ski chalet hire

ski chalet [location name]


There are other examples that should be considered. A client I worked with in the past once gave a seed keyword of “biomass boilers.” But after keyword research was conducted, a more colloquial term for “biomass boilers” in the UK is “wood burners.” This is an important distinction and should be picked up as early in the process as possible. Keyword research tools are not infallible, so if budget and resource allows, you may wish to consult current and potential customers about which terms they might use to find the products or services being offered.

3. Filtering out irrelevant keywords

Once you’ve expanded the seed keyword list, it’s time to start filtering out irrelevant keywords. This is pretty labor-intensive and involves sorting through rows of data. We tend to use Moz’s Keyword Explorer, filter by relevancy, and work our way down. As we go, we’ll add keywords to lists within the platform and start to try and sort things by topic. Topics are fairly subjective, and often you’ll get overlap between them. We’ll group similar keywords and phrases together in a topic based off the semantic relativity of those phrases. For example:



ski chalet

ski chalet

ski chalet rental

ski chalet hire

ski chalet [location name]

catered chalet

catered chalet

luxury catered chalet

catered chalet rental

catered chalet hire

catered chalet [location name]

ski accommodation

ski accommodation

cheap ski accommodation

budget ski accommodation

ski accomodation [location name]

Many of the above keywords are decision-based keywords — particularly those with rental or hire in them. They’re showing buying intent. We’ll then try to put ourselves in the mind of the buyer and come up with keywords towards the start of the buyer’s journey.



Buyer’s stage

ski resorts

ski resorts

best ski resorts

ski resorts europe

ski resorts usa

ski resorts canada

top ski resorts

cheap ski resorts

luxury ski resorts




skiing guide

skiing beginner’s guide


family holidays

family holidays

family winter holidays

family trips


This helps us cater to customers that might not be in the frame of mind to purchase just yet — they’re just doing research. It means we cast the net wider. Conversion rates for these keywords are unlikely to be high (at least, for purchases or enquiries) but if utilized as part of a wider marketing strategy, we should look to capture some form of information, primarily an email address, so we can send people relevant information via email or remarketing ads later down the line.

4. Pulling in data

Once you’ve expanded the seed keywords out, Keyword Explorer’s handy list function enables your to break things down into separate topics. You can then export that data into a CSV and start combining it with other data sources. If you have SEMrush API access, Dave Sottimano’s API Library is a great time saver; otherwise, you may want to consider uploading the keywords into the Keywords Everywhere Chrome extension and manually exporting the data and combining everything together. You should then have a spreadsheet that looks something like this:

You could then add in additional data sources. There’s no reason you couldn’t combine the above with volumes and competition metrics from other SEO tools. Consider including existing keyword ranking information or Google Ads data in this process. Keywords that convert well on PPC should do the same organically and should therefore be considered. Wil Reynolds talks about this particular tactic a lot.

5. Aligning phrases to the buyer’s journey

The next stage of the process is to start categorizing the keywords into the stage of the buyer’s journey. Something we’ve found at Aira is that keywords don’t always fit into a predefined stage. Someone looking for “marketing services” could be doing research about what marketing services are, but they could also be looking for a provider. You may get keywords that could be either awareness/consideration or consideration/decision. Use your judgement, and remember this is subjective. Once complete, you should end up with some data that looks similar to this:

This categorization is important, as it starts to frame what type of content is most appropriate for that keyword or phrase.

The next stage of this process is to start noticing patterns in keyphrases and where they get mapped to in the buyer’s journey. Often you’ll see keywords like “price” or ”cost” at the decision stage and phrases like “how to” at the awareness stage. Once you start identifying these patterns, possibly using a variation of Tom Casano’s keyword clustering approach, you can then try to find a way to automate so that when these terms appear in your keyword column, the intent automatically gets updated.

Once completed, we can then start to define each of our keywords and give them a type:

  • Pillar page
  • Cluster page
  • Target page
  • Track & monitor
  • Ignore

We use this document to start thinking about what type of content is most effective for that piece given the search volume available, how competitive that term is, how profitable the keyword could be, and what stage the buyer might be at. We’re trying to find that sweet spot between having enough search volume, ensuring we can actually rank for that keyphrase (there’s no point in a small e-commerce startup trying to rank for “buy nike trainers”), and how important/profitable that phrase could be for the business. The below Venn diagram illustrates this nicely:

We also reorder the keywords so keywords that are semantically similar are bucketed together into parent and child keywords. This helps to inform our on-page recommendations:

From the example above, you can see “digital marketing agency” as the main keyword, but “digital marketing services” & “digital marketing agency uk” sit underneath.

We also use conditional formatting to help identify keyword page types:

And then sheets to separate topics out:

Once this is complete, we have a data-rich spreadsheet of keywords that we then work with clients on to make sure we’ve not missed anything. The document can get pretty big, particularly when you’re dealing with e-commerce websites that have thousands of products.

5. Keyword mapping and content gap analysis

We then map these keywords to existing content to ensure that the site hasn’t already written about the subject in the past. We often use Google Search Console data to do this so we understand how any existing content is being interpreted by the search engines. By doing this we’re creating our own content gap analysis. An example output can be seen below:

The above process takes our keyword research and then applies the usual on-page concepts (such as optimizing meta titles, URLs, descriptions, headings, etc) to existing pages. We’re also ensuring that we’re mapping our user intent and type of page (pillar, cluster, target, etc), which helps us decide what sort of content the piece should be (such as a blog post, webinar, e-book, etc). This process helps us understand what keywords and phrases the site is not already being found for, or is not targeted to.

Free template

I promised a template Google Sheet earlier in this blog post and you can find that here.

Do you have any questions on this process? Ways to improve it? Feel free to post in the comments below or ping me over on Twitter!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

The Advanced Guide to Keyword Clustering

Posted by tomcasano

If your goal is to grow your organic traffic, you have to think about SEO in terms of “product/market fit.”

Keyword research is the “market” (what users are actually searching for) and content is the “product” (what users are consuming). The “fit” is optimization.

To grow your organic traffic, you need your content to mirror the reality of what users are actually searching for. Your content planning and creation, keyword mapping, and optimization should all align with the market. This is one of the best ways to grow your organic traffic.

Why bother with keyword grouping?

One web page can rank for multiple keywords. So why aren’t we hyper-focused on planning and optimizing content that targets dozens of similar and related keywords?

Why target only one keyword with one piece of content when you can target 20?

The impact of keyword clustering to acquire more organic traffic is not only underrated, it is largely ignored. In this guide, I’ll share with you our proprietary process we’ve pioneered for keyword grouping so you can not only do it yourself, but you can maximize the number of keywords your amazing content can rank for.

Here’s a real-world example of a handful of the top keywords that this piece of content is ranking for. The full list is over 1,000 keywords.

17 different keywords one page is ranking for

Why should you care?

It’d be foolish to focus on only one keyword, as you’d lose out on 90%+ of the opportunity.

Here’s one of my favorite examples of all of the keywords that one piece of content could potentially target:

List of ~100 keywords one page ranks for

Let’s dive in!

Part 1: Keyword collection

Before we start grouping keywords into clusters, we first need our dataset of keywords from which to group from.

In essence, our job in this initial phase is to find every possible keyword. In the process of doing so, we’ll also be inadvertently getting many irrelevant keywords (thank you, Keyword Planner). However, it’s better to have many relevant and long-tail keywords (and the ability to filter out the irrelevant ones) than to only have a limited pool of keywords to target.

For any client project, I typically say that we’ll collect anywhere from 1,000 to 6,000 keywords. But truth be told, we’ve sometimes found 10,000+ keywords, and sometimes (in the instance of a local, niche client), we’ve found less than 1,000.

I recommend collecting keywords from about 8–12 different sources. These sources are:

  1. Your competitors
  2. Third-party data tools (Moz, Ahrefs, SEMrush, AnswerThePublic, etc.)
  3. Your existing data in Google Search Console/Google Analytics
  4. Brainstorming your own ideas and checking against them
  5. Mashing up keyword combinations
  6. Autocomplete suggestions and “Searches related to” from Google

There’s no shortage of sources for keyword collection, and more keyword research tools exist now than ever did before. Our goal here is to be so extensive that we never have to go back and “find more keywords” in the future — unless, of course, there’s a new topic we are targeting.

The prequel to this guide will expand upon keyword collection in depth. For now, let’s assume that you’ve spent a few hours collecting a long list of keywords, you have removed the duplicates, and you have semi-reliable search volume data.

Part 2: Term analysis

Now that you have an unmanageable list of 1,000+ keywords, let’s turn it into something useful.

We begin with term analysis. What the heck does that mean?

We break each keyword apart into its component terms that comprise the keyword, so we can see which terms are the most frequently occurring.

For example, the keyword: “best natural protein powder” is comprised of 4 terms: “best,” “natural,” “protein,” and “powder.” Once we break apart all of the keywords into their component parts, we can more readily analyze and understand which terms (as subcomponents of the keywords) are recurring the most in our keyword dataset.

Here’s a sampling of 3 keywords:

  • best natural protein powder
  • most powerful natural anti inflammatory
  • how to make natural deodorant

Take a closer look, and you’ll notice that the term “natural” occurs in all three of these keywords. If this term is occurring very frequently throughout our long list of keywords, it’ll be highly important when we start grouping our keywords.

You will need a word frequency counter to give you this insight. The ultimate free tool for this is Write Words’ Word Frequency Counter. It’s magical.

Paste in your list of keywords, click submit, and you’ll get something like this:

List of keywords and how frequently they occur

Copy and paste your list of recurring terms into a spreadsheet. You can obviously remove prepositions and terms like “is,” “for,” and “to.”

You don’t always get the most value by just looking at individual terms. Sometimes a two-word or three-word phrase gives you insights you wouldn’t have otherwise. In this example, you see the terms “milk” and “almond” appearing, but it turns out that this is actually part of the phrase “almond milk.”

To gather these insights, use the Phrase Frequency Counter from WriteWords and repeat the process for phrases that have two, three, four, five, and six terms in them. Paste all of this data into your spreadsheet too.

A two-word phrase that occurs more frequently than a one-word phrase is an indicator of its significance. To account for this, I use the COUNTA function in Google Sheets to show me the number of terms in a phrase:


Now we can look at our keyword data with a second dimension: not only the number of times a term or phrase occurs, but also how many words are in that phrase.

Finally, to give more weighting to phrases that recur less frequently but have more terms in them, I put an exponent on the number of terms with a basic formula:


In other words, take the number of terms and raise it to a power, and then multiply that by the frequency of its occurrence. All this does is give more weighting to the fact that a two-word phrase that occurs less frequently is still more important than a one-word phrase that might occur more frequently.

As I never know just the right power to raise it to, I test several and keep re-sorting the sheet to try to find the most important terms and phrases in the sheet.

Spreadsheet of keywords and their weighted importance

When you look at this now, you can already see patterns start to emerge and you’re already beginning to understand your searchers better.

In this example dataset, we are going from a list of 10k+ keywords to an analysis of terms and phrases to understand what people are really asking. For example, “what is the best” and “where can i buy” are phrases we can absolutely understand searchers using.

I mark off the important terms or phrases. I try to keep this number to under 50 and to a maximum of around 75; otherwise, grouping will get hairy in Part 5.

Part 3: Hot words

What are hot words?

Hot words are the terms or phrases from that last section that we have deemed to be the most important. We’ve explained hot words in greater depth here.

Why are hot words important?

We explain:

This exercise provides us with a handful of the most relevant and important terms and phrases for traffic and relevancy, which can then be used to create the best content strategies — content that will rank highly and, in turn, help us reap traffic rewards for your site.

When developing your hot words list, we identify the highest frequency and most relevant terms from a large range of keywords used by several of your highest-performing competitors to generate their traffic, and these become “hot words.”

When working with a client (or doing this for yourself), there are generally 3 questions we want answered for each hot word:

  1. Which of these terms are the most important for your business? (0–10)
  2. Which of these terms are negative keywords (we want to ignore or avoid)?
  3. Any other feedback about qualified or high-intent keywords?

We narrow down the list, removing any negative keywords or keywords that are not really important for the website.

Once we have our final list of hot words, we organize them into broad topic groups like this:

Organized spreadsheet of hot words by topic

The different colors have no meaning, but just help to keep it visually organized for when we group them.

One important thing to note is that word stems play an important part here.

For example, consider that all of these words below have the same underlying relevance and meaning:

  • blog
  • blogs
  • blogger
  • bloggers
  • blogging

Therefore, when we’re grouping keywords, to consider “blog” and “blogging” and “bloggers” as part of the same cluster, we’ll need to use the word stem of “blog” for all of them. Word stems are our best friend when grouping. Synonyms can be organized in a similar way, which are basically two different ways of saying the same thing (and the same user intent) such as “build” and “create” or “search” and “look for.”

Part 4: Preparation for keyword grouping

Now we’re going to get ourselves set up for our Herculean task of clustering.

To start, copy your list of hot words and transpose them horizontally across a row.

Screenshot of menu in spreadsheet

List your keywords in the first column.

Screenshot of keyword spreadsheet

Now, the real magic begins.

After much research and noodling around, I discovered the function in Google Sheets that tells us whether a stem or term is in a keyword or not. It uses RegEx:


This simply tells us whether this word stem or word is in that keyword or not. You have to individually set the term for each column to get your “YES” or “NO” answer. I then drag this formula down to all of the rows to get all of the YES/NO answers. Google Sheets often takes a minute or so to process all of this data.

Next, we have to “hard code” these formulas so we can remove the NOs and be left with only a YES if that terms exists in that keyword.

Copy all of the data and “Paste values only.”

Screenshot of spreadsheet menu

Now, use “Find and replace” to remove all of the NOs.

Screenshot of Find and Replace popup

What you’re left with is nothing short of a work of art. You now have the most powerful way to group your keywords. Let the grouping begin!

Screenshot of keyword spreadsheet

Part 5: Keyword grouping

At this point, you’re now set up for keyword clustering success.

This part is half art, half science. No wait, I take that back. To do this part right, you need:

  • A deep understanding of who you’re targeting, why they’re important to the business, user intent, and relevance
  • Good judgment to make tradeoffs when breaking keywords apart into groups
  • Good intuition

This is one of the hardest parts for me to train anyone to do. It comes with experience.

At the top of the sheet, I use the COUNTA function to show me how many times this word step has been found in our keyword set:


This is important because as a general rule, it’s best to start with the most niche topics that have the least overlap with other topics. If you start too broadly, your keywords will overlap with other keyword groups and you’ll have a hard time segmenting them into meaningful groups. Start with the most narrow and specific groups first.

To begin, you want to sort the sheet by word stem.

The word stems that occur only a handful of times won’t have a large amount of overlap. So I start by sorting the sheet by that column, and copying and pasting those keywords into their own new tab.

Now you have your first keyword group!

Here’s a first group example: the “matcha” group. This can be its own project in its own right: for instance, if a website was all about matcha tea and there were other tangentially related keywords.

Screenshot of list of matcha-related keywords

As we continue breaking apart one keyword group and then another, by the end we’re left with many different keyword groups. If the groups you’ve arrived at are too broad, you can subdivide them even more into narrower keyword subgroups for more focused content pieces. You can follow the same process for this broad keyword group, and make it a microcosm of the same process of dividing the keywords into smaller groups based on word stems.

We can create an overview of the groups to see the volume and topical opportunities from a high level.

Screenshot of spreadsheet with keyword group overview

We want to not only consider search volume, but ideally also intent, competitiveness, and so forth.


You’ve successfully taken a list of thousands of keywords and grouped them into relevant keyword groups.

Wait, why did we do all of this hard work again?

Now you can finally attain that “product/market fit” we talked about. It’s magical.

You can take each keyword group and create a piece of optimized content around it, targeting dozens of keywords, exponentially raising your potential to acquire more organic traffic. Boo yah!

All done. Now what?

Now the real fun begins. You can start planning out new content that you never knew you needed to create. Alternatively, you can map your keyword groups (and subgroups) to existing pages on your website and add in keywords and optimizations to the header tags, body text, and so forth for all those long-tail keywords you had ignored.

Keyword grouping is underrated, overlooked, and ignored at large. It creates a massive new opportunity to optimize for terms where none existed. Sometimes it’s just adding one phrase or a few sentences targeting a long-tail keyword here and there that will bring in that incremental search traffic for your site. Do this dozens of times and you will keep getting incremental increases in your organic traffic.

What do you think?

Leave a comment below and let me know your take on keyword clustering.

Need a hand? Just give me a shout, I’m happy to help.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

How to Get More Keyword Metrics for Your Target Keywords

Posted by Bill.Sebald

If you’re old in SEO years, you remember the day [not provided] was introduced. It was a dark, dark day. SEOs lost a vast amount of trusty information. Click data. Conversion data. This was incredibly valuable, allowing SEOs to prioritize their targets.

Google said the info was removed for security purposes, while suspicious SEOs thought this was a push towards spending more on AdWords (now Google Ads). I get it — since AdWords would give you the keyword data SEOs cherished, the “controversy” was warranted, in my opinion. The truth is out there.

But we’ve moved on, and learned to live with the situation. Then a few years later, Google Webmaster Tools (now Search Console) started providing some of the keyword data in the Search Analytics report. Through the years, the report got better and better.

But there’s still a finite set of keywords in the interface. You can’t get more than 999 in your report.

Search Analytics Report

Guess what? Google has more data for you!

The Google Search Console API is your friend. This summer it became even friendlier, providing 16 months worth of data. What you may not know is this API can give you more than 999 keywords. By way of example, the API provides more than 45,000 for our Greenlane site. And we’re not even a very large site. That’s right — the API can give you keywords, clicks, average position, impressions, and CTR %.

Salivating yet?

How to easily leverage the API

If you’re not very technical and the thought of an API frightens you, I promise there’s nothing to fear. I’m going to show you a way to leverage the data using Google Sheets.

Here is what you will need:

  1. Google Sheets (free)
  2. Supermetrics Add-On (free trial, but a paid tool)

If you haven’t heard of Google Sheets, it’s one of several tools Google provides for free. This directly competes with Microsoft Excel. It’s a cloud-based spreadsheet that works exceptionally well.

If you aren’t familiar with Supermetrics, it’s an add-on for Google Sheets that allows data to be pulled in from other sources. In this case, one of the sources will be Google Search Console. Now, while Supermetrics has a free trial, paid is the way to go. It’s worth it!

Installation of Supermetrics:

  1. Open Google Sheets and click the Add-On option
  2. Click Get Add-Ons
  3. A window will open where you can search for Supermetrics. It will look like this:

How To Install Supermetrics

From there, just follow the steps. It will immediately ask to connect to your Google account. I’m sure you’ve seen this kind of dialog box before:

Supermetrics wants to access your Google Account

You’ll be greeted with a message for launching the newly installed add-on. Just follow the prompts to launch. Next you’ll see a new window to the right of your Google Sheet.

Launch message

At this point, you should see the following note:

Great, you’re logged into Google Search Console! Now let’s run your first query. Pick an account from the list below.

Next, all you have to do is work down the list in Supermetrics. Data Source, Select Sites, and Select Dates are pretty self-explanatory. When you reach the “Select metrics” toggle, choose Impressions, Clicks, CTR (%), and Average Position.


When you reach “Split by,” choose Search Query as the Split to rows option. And pick a large number for number of rows to fetch. If you also want the page URLs (perhaps you’d like your data divided by the page level), you just need to add Full URL as well.

Split By

You can play with the other Filter and Options if you’d like, but you’re ready to click Apply Changes and receive the data. It should compile like this:

Final result

Got the data. Now what?

Sometimes optimization is about taking something that’s working, and making it work better. This data can show you which keywords and topics are important to your audience. It’s also a clue towards what Google thinks you’re important for (thus, rewarding you with clicks).

SEMrush and Ahrefs can provide ranking keyword data with their estimated clicks, but impressions is an interesting metric here. High impression and low clicks? Maybe your title and description tags aren’t compelling enough. It’s also fun to VLOOKUP their data against this, to see just how accurate they are (or are not). Or you can use a tool like PowerBI to append other customer or paid search metrics to paint a bigger picture of your visitors’ mindset.


Sometimes the littlest hacks are the most fun. Google commonly holds some data back through their free products (the Greenlane Indexation Tester is a good example with the old interface). We know Search Planner and Google Analytics have more than they share. But in those cases, where directional information can sometimes be enough, digging out even more of your impactful keyword data is pure gold.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

Ranking the 6 Most Accurate Keyword Research Tools

Posted by Jeff_Baker

In January of 2018 Brafton began a massive organic keyword targeting campaign, amounting to over 90,000 words of blog content being published.

Did it work?

Well, yeah. We doubled the number of total keywords we rank for in less than six months. By using our advanced keyword research and topic writing process published earlier this year we also increased our organic traffic by 45% and the number of keywords ranking in the top ten results by 130%.

But we got a whole lot more than just traffic.

From planning to execution and performance tracking, we meticulously logged every aspect of the project. I’m talking blog word count, MarketMuse performance scores, on-page SEO scores, days indexed on Google. You name it, we recorded it.

As a byproduct of this nerdery, we were able to draw juicy correlations between our target keyword rankings and variables that can affect and predict those rankings. But specifically for this piece…

How well keyword research tools can predict where you will rank.

A little background

We created a list of keywords we wanted to target in blogs based on optimal combinations of search volume, organic keyword difficulty scores, SERP crowding, and searcher intent.

We then wrote a blog post targeting each individual keyword. We intended for each new piece of blog content to rank for the target keyword on its own.

With our keyword list in hand, my colleague and I manually created content briefs explaining how we would like each blog post written to maximize the likelihood of ranking for the target keyword. Here’s an example of a typical brief we would give to a writer:

This image links to an example of a content brief Brafton delivers to writers.

Between mid-January and late May, we ended up writing 55 blog posts each targeting 55 unique keywords. 50 of those blog posts ended up ranking in the top 100 of Google results.

We then paused and took a snapshot of each URL’s Google ranking position for its target keyword and its corresponding organic difficulty scores from Moz, SEMrush, Ahrefs, SpyFu, and KW Finder. We also took the PPC competition scores from the Keyword Planner Tool.

Our intention was to draw statistical correlations between between our keyword rankings and each tool’s organic difficulty score. With this data, we were able to report on how accurately each tool predicted where we would rank.

This study is uniquely scientific, in that each blog had one specific keyword target. We optimized the blog content specifically for that keyword. Therefore every post was created in a similar fashion.

Do keyword research tools actually work?

We use them every day, on faith. But has anyone ever actually asked, or better yet, measured how well keyword research tools report on the organic difficulty of a given keyword?

Today, we are doing just that. So let’s cut through the chit-chat and get to the results…

This image ranks each of the 6 keyword research tools, in order, Moz leads with 4.95 stars out of 5, followed by KW Finder, SEMrush, AHREFs, SpyFu, and lastly Keyword Planner Tool.

While Moz wins top-performing keyword research tool, note that any keyword research tool with organic difficulty functionality will give you an advantage over flipping a coin (or using Google Keyword Planner Tool).

As you will see in the following paragraphs, we have run each tool through a battery of statistical tests to ensure that we painted a fair and accurate representation of its performance. I’ll even provide the raw data for you to inspect for yourself.

Let’s dig in!

The Pearson Correlation Coefficient

Yes, statistics! For those of you currently feeling panicked and lobbing obscenities at your screen, don’t worry — we’re going to walk through this together.

In order to understand the relationship between two variables, our first step is to create a scatter plot chart.

Below is the scatter plot for our 50 keyword rankings compared to their corresponding Moz organic difficulty scores.

This image shows a scatter plot for Moz's keyword difficulty scores versus our keyword rankings. In general, the data clusters fairly tight around the regression line.

We start with a visual inspection of the data to determine if there is a linear relationship between the two variables. Ideally for each tool, you would expect to see the X variable (keyword ranking) increase proportionately with the Y variable (organic difficulty). Put simply, if the tool is working, the higher the keyword difficulty, the less likely you will rank in a top position, and vice-versa.

This chart is all fine and dandy, however, it’s not very scientific. This is where the Pearson Correlation Coefficient (PCC) comes into play.

The PCC measures the strength of a linear relationship between two variables. The output of the PCC is a score ranging from +1 to -1. A score greater than zero indicates a positive relationship; as one variable increases, the other increases as well. A score less than zero indicates a negative relationship; as one variable increases, the other decreases. Both scenarios would indicate a level of causal relationship between the two variables. The stronger the relationship between the two veriables, the closer to +1 or -1 the PCC will be. Scores near zero indicate a weak or no relatioship.

Phew. Still with me?

So each of these scatter plots will have a corresponding PCC score that will tell us how well each tool predicted where we would rank, based on its keyword difficulty score.

We will use the following table from statisticshowto.com to interpret the PCC score for each tool:

Coefficient Correlation R Score


.70 or higher

Very strong positive relationship

.40 to +.69

Strong positive relationship

.30 to +.39

Moderate positive relationship

.20 to +.29

Weak positive relationship

.01 to +.19

No or negligible relationship


No relationship [zero correlation]

-.01 to -.19

No or negligible relationship

-.20 to -.29

Weak negative relationship

-.30 to -.39

Moderate negative relationship

-.40 to -.69

Strong negative relationship

-.70 or higher

Very strong negative relationship

In order to visually understand what some of these relationships would look like on a scatter plot, check out these sample charts from Laerd Statistics.

These scatter plots show three types of correlations: positive, negative, and no correlation. Positive correlations have data plots that move up and to the right. Negative correlations move down and to the right. No correlation has data that follows no linear pattern

And here are some examples of charts with their correlating PCC scores (r):

These scatter plots show what different PCC values look like visually. The tighter the grouping of data around the regression line, the higher the PCC value.

The closer the numbers cluster towards the regression line in either a positive or negative slope, the stronger the relationship.

That was the tough part – you still with me? Great, now let’s look at each tool’s results.

Test 1: The Pearson Correlation Coefficient

Now that we’ve all had our statistics refresher course, we will take a look at the results, in order of performance. We will evaluate each tool’s PCC score, the statistical significance of the data (P-val), the strength of the relationship, and the percentage of keywords the tool was able to find and report keyword difficulty values for.

In order of performance:

#1: Moz

This image shows a scatter plot for Moz's keyword difficulty scores versus our keyword rankings. In general, the data clusters fairly tight around the regression line.

Revisiting Moz’s scatter plot, we observe a tight grouping of results relative to the regression line with few moderate outliers.

Moz Organic Difficulty Predictability




.003 (P<0.05)



% Keywords Matched


Moz came in first with the highest PCC of .412. As an added bonus, Moz grabs data on keyword difficulty in real time, rather than from a fixed database. This means that you can get any keyword difficulty score for any keyword.

In other words, Moz was able to generate keyword difficulty scores for 100% of the 50 keywords studied.

#2: SpyFu

This image shows a scatter plot for SpyFu's keyword difficulty scores versus our keyword rankings. The plot is similar looking to Moz's, with a few larger outliers.

Visually, SpyFu shows a fairly tight clustering amongst low difficulty keywords, and a couple moderate outliers amongst the higher difficulty keywords.

SpyFu Organic Difficulty Predictability




.01 (P<0.05)



% Keywords Matched


SpyFu came in right under Moz with 1.7% weaker PCC (.405). However, the tool ran into the largest issue with keyword matching, with only 40 of 50 keywords producing keyword difficulty scores.

#3: SEMrush

This image shows a scatter plot for SEMrush's keyword difficulty scores versus our keyword rankings. The data has a significant amount of outliers relative to the regression line.

SEMrush would certainly benefit from a couple mulligans (a second chance to perform an action). The Correlation Coefficient is very sensitive to outliers, which pushed SEMrush’s score down to third (.364).

SEMrush Organic Difficulty Predictability




.01 (P<0.05)



% Keywords Matched


Further complicating the research process, only 46 of 50 keywords had keyword difficulty scores associated with them, and many of those had to be found through SEMrush’s “phrase match” feature individually, rather than through the difficulty tool.

The process was more laborious to dig around for data.

#4: KW Finder

This image shows a scatter plot for KW Finder's keyword difficulty scores versus our keyword rankings. The data also has a significant amount of outliers relative to the regression line.

KW Finder definitely could have benefitted from more than a few mulligans with numerous strong outliers, coming in right behind SEMrush with a score of .360.

KW Finder Organic Difficulty Predictability




.01 (P<0.05)



% Keywords Matched


Fortunately, the KW Finder tool had a 100% match rate without any trouble digging around for the data.

#5: Ahrefs

This image shows a scatter plot for AHREF's keyword difficulty scores versus our keyword rankings. The data shows tight clustering amongst low difficulty score keywords, and a wide distribution amongst higher difficulty scores.

Ahrefs comes in fifth by a large margin at .316, barely passing the “weak relationship” threshold.

Ahrefs Organic Difficulty Predictability




.03 (P<0.05)



% Keywords Matched


On a positive note, the tool seems to be very reliable with low difficulty scores (notice the tight clustering for low difficulty scores), and matched all 50 keywords.

#6: Google Keyword Planner Tool

This image shows a scatter plot for Google Keyword Planner Tool's keyword difficulty scores versus our keyword rankings. The data shows randomly distributed plots with no linear relationship.

Before you ask, yes, SEO companies still use the paid competition figures from Google’s Keyword Planner Tool (and other tools) to assess organic ranking potential. As you can see from the scatter plot, there is in fact no linear relationship between the two variables.

Google Keyword Planner Tool Organic Difficulty Predictability




Statistically insignificant/no linear relationship



% Keywords Matched


SEO agencies still using KPT for organic research (you know who you are!) — let this serve as a warning: You need to evolve.

Test 1 summary

For scoring, we will use a ten-point scale and score every tool relative to the highest-scoring competitor. For example, if the second highest score is 98% of the highest score, the tool will receive a 9.8. As a reminder, here are the results from the PCC test:

This bar chart shows the final PCC values for the first test, summarized.

And the resulting scores are as follows:


PCC Test







KW Finder






Moz takes the top position for the first test, followed closely by SpyFu (with an 80% match rate caveat).

Test 2: Adjusted Pearson Correlation Coefficient

Let’s call this the “Mulligan Round.” In this round, assuming sometimes things just go haywire and a tool just flat-out misses, we will remove the three most egregious outliers to each tool’s score.

Here are the adjusted results for the handicap round:

Adjusted Scores (3 Outliers removed)


Difference (+/-)
















Keyword Planner Tool



As noted in the original PCC test, some of these tools really took a big hit with major outliers. Specifically, Ahrefs and SEMrush benefitted the most from their outliers being removed, gaining .162 and .150 respectively to their scores, while Moz benefited the least from the adjustments.

For those of you crying out, “But this is real life, you don’t get mulligans with SEO!”, never fear, we will make adjustments for reliability at the end.

Here are the updated scores at the end of round two:


PCC Test

Adjusted PCC














KW Finder












SpyFu takes the lead! Now let’s jump into the final round of statistical tests.

Test 3: Resampling

Being that there has never been a study performed on keyword research tools at this scale, we wanted to ensure that we explored multiple ways of looking at the data.

Big thanks to Russ Jones, who put together an entirely different model that answers the question: “What is the likelihood that the keyword difficulty of two randomly selected keywords will correctly predict the relative position of rankings?”

He randomly selected 2 keywords from the list and their associated difficulty scores.

Let’s assume one tool says that the difficulties are 30 and 60, respectively. What is the likelihood that the article written for a score of 30 ranks higher than the article written on 60? Then, he performed the same test 1,000 times.

He also threw out examples where the two randomly selected keywords shared the same rankings, or data points were missing. Here was the outcome:


% Guessed correctly







Keyword Finder






As you can see, this tool was particularly critical on each of the tools. As we are starting to see, no one tool is a silver bullet, so it is our job to see how much each tool helps make more educated decisions than guessing.

Most tools stayed pretty consistent with their levels of performance from the previous tests, except SpyFu, which struggled mightily with this test.

In order to score this test, we need to use 50% as the baseline (equivalent of a coin flip, or zero points), and scale each tool relative to how much better it performed over a coin flip, with the top scorer receiving ten points.

For example, Ahrefs scored 11.2% better than flipping a coin, which is 8.2% less than Moz which scored 12.2% better than flipping a coin, giving AHREFs a score of 9.2.

The updated scores are as follows:


PCC Test

Adjusted PCC


















KW Finder















So after the last statistical accuracy test, we have Moz consistently performing alone in the top tier. SEMrush, Ahrefs, and KW Finder all turn in respectable scores in the second tier, followed by the unique case of SpyFu, which performed outstanding in the first two tests (albeit, only returning results on 80% of the tested keywords), then falling flat on the final test.

Finally, we need to make some usability adjustments.

Usability Adjustment 1: Keyword Matching

A keyword research tool doesn’t do you much good if it can’t provide results for the keywords you are researching. Plain and simple, we can’t treat two tools as equals if they don’t have the same level of practical functionality.

To explain in practical terms, if a tool doesn’t have data on a particular keyword, one of two things will happen:

  1. You have to use another tool to get the data, which devalues the entire point of using the original tool.
  2. You miss an opportunity to rank for a high-value keyword.

Neither scenario is good, therefore we developed a penalty system. For each 10% match rate under 100%, we deducted a single point from the final score, with a maximum deduction of 5 points. For example, if a tool matched 92% of the keywords, we would deduct .8 points from the final score.

One may argue that this penalty is actually too lenient considering the significance of the two unideal scenarios outlined above.

The penalties are as follows:


Match Rate


KW Finder












Keyword Planner Tool






Please note we gave SEMrush a lot of leniency, in that technically, many of the keywords evaluated were not found in its keyword difficulty tool, but rather through manually digging through the phrase match tool. We will give them a pass, but with a stern warning!

Usability Adjustment 2: Reliability

I told you we would come back to this! Revisiting the second test in which we threw away the three strongest outliers that negatively impacted each tool’s score, we will now make adjustments.

In real life, there are no mulligans. In real life, each of those three blog posts that were thrown out represented a significant monetary and time investment. Therefore, when a tool has a major blunder, the result can be a total waste of time and resources.

For that reason, we will impose a slight penalty on those tools that benefited the most from their handicap.

We will use the level of PCC improvement to evaluate how much a tool benefitted from removing their outliers. In doing so, we will be rewarding the tools that were the most consistently reliable. As a reminder, the amounts each tool benefitted were as follows:


Difference (+/-)





Keyword Planner Tool








In calculating the penalty, we scored each of the tools relative to the top performer, giving the top performer zero penalty and imposing penalties based on how much additional benefit the tools received over the most reliable tool, on a scale of 0–100%, with a maximum deduction of 5 points.

So if a tool received twice the benefit of the top performing tool, it would have had a 100% benefit, receiving the maximum deduction of 5 points. If another tool received a 20% benefit over of the most reliable tool, it would get a 1-point deduction. And so on.


% Benefit








Keyword Planner Tool






KW Finder







All told, our penalties were fairly mild, with a slight shuffling in the middle tier. The final scores are as follows:


Total Score

Stars (5 max)




KW Finder
















Using any organic keyword difficulty tool will give you an advantage over not doing so. While none of the tools are a crystal ball, providing perfect predictability, they will certainly give you an edge. Further, if you record enough data on your own blogs’ performance, you will get a clearer picture of the keyword difficulty scores you should target in order to rank on the first page.

For example, we know the following about how we should target keywords with each tool:


Average KD ranking ≤10

Average KD ranking ≥ 11
















This is pretty powerful information! It’s either first page or bust, so we now know the threshold for each tool that we should set when selecting keywords.

Stay tuned, because we made a lot more correlations between word count, days live, total keywords ranking, and all kinds of other juicy stuff. Tune in again in early September for updates!

We hope you found this test useful, and feel free to reach out with any questions on our math!

Disclaimer: These results are estimates based on 50 ranking keywords from 50 blog posts and keyword research data pulled from a single moment in time. Search is a shifting landscape, and these results have certainly changed since the data was pulled. In other words, this is about as accurate as we can get from analyzing a moving target.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

Rewriting the Beginner’s Guide to SEO, Chapter 3: Keyword Research

Posted by BritneyMuller

Welcome to the draft of Chapter Three of the new and improved Beginner’s Guide to SEO! So far you’ve been generous and energizing with your feedback for our outline, Chapter One, and Chapter Two. We’re asking for a little more of your time as we debut the our third chapter on keyword research. Please let us know what you think in the comments!

Chapter 3: Keyword Research

Understand what your audience wants to find.

Now that you’ve learned how to show up in search results, let’s determine which strategic keywords to target in your website’s content, and how to craft that content to satisfy both users and search engines.

The power of keyword research lies in better understanding your target market and how they are searching for your content, services, or products.

Keyword research provides you with specific search data that can help you answer questions like:

  • What are people searching for?
  • How many people are searching for it?
  • In what format do they want that information?

In this chapter, you’ll get tools and strategies for uncovering that information, as well as learn tactics that’ll help you avoid keyword research foibles and build strong content. Once you uncover how your target audience is searching for your content, you begin to uncover a whole new world of strategic SEO!

What terms are people searching for?

You may know what you do, but how do people search for the product, service, or information you provide? Answering this question is a crucial first step in the keyword research process.

Discovering keywords

You likely have a few keywords in mind that you would like to rank for. These will be things like your products, services, or other topics your website addresses, and they are great seed keywords for your research, so start there! You can enter those keywords into a keyword research tool to discover average monthly search volume and similar keywords. We’ll get into search volume in greater depth in the next section, but during the discovery phase, it can help you determine which variations of your keywords are most popular amongst searchers.

Once you enter in your seed keywords into a keyword research tool, you will begin to discover other keywords, common questions, and topics for your content that you might have otherwise missed.

Let’s use the example of a florist that specializes in weddings.

Typing “wedding” and “florist” into a keyword research tool, you may discover highly relevant, highly searched for related terms such as:

  • Wedding bouquets
  • Bridal flowers
  • Wedding flower shop

In the process of discovering relevant keywords for your content, you will likely notice that the search volume of those keywords varies greatly. While you definitely want to target terms that your audience is searching for, in some cases, it may be more advantageous to target terms with lower search volume because they’re far less competitive.

Since both high- and low-competition keywords can be advantageous for your website, learning more about search volume can help you prioritize keywords and pick the ones that will give your website the biggest strategic advantage.

Pro tip: Diversify!

It’s important to note that entire websites don’t rank for keywords, pages do. With big brands, we often see the homepage ranking for many keywords, but for most websites, this isn’t usually the case. Many websites receive more organic traffic to pages other than the homepage, which is why it’s so important to diversify your website’s pages by optimizing each for uniquely valuable keywords.

How often are those terms searched?

Uncovering search volume

The higher the search volume for a given keyword or keyword phrase, the more work is typically required to achieve higher rankings. This is often referred to as keyword difficulty and occasionally incorporates SERP features; for example, if many SERP features (like featured snippets, knowledge graph, carousels, etc) are clogging up a keyword’s result page, difficulty will increase. Big brands often take up the top 10 results for high-volume keywords, so if you’re just starting out on the web and going after the same keywords, the uphill battle for ranking can take years of effort.

Typically, the higher the search volume, the greater the competition and effort required to achieve organic ranking success. Go too low, though, and you risk not drawing any searchers to your site. In many cases, it may be most advantageous to target highly specific, lower competition search terms. In SEO, we call those long-tail keywords.

Understanding the long tail

It would be great to rank #1 for the keyword “shoes”… or would it?

It’s wonderful to deal with keywords that have 50,000 searches a month, or even 5,000 searches a month, but in reality, these popular search terms only make up a fraction of all searches performed on the web. In fact, keywords with very high search volumes may even indicate ambiguous intent, which, if you target these terms, it could put you at risk for drawing visitors to your site whose goals don’t match the content your page provides.

Does the searcher want to know the nutritional value of pizza? Order a pizza? Find a restaurant to take their family? Google doesn’t know, so they offer these features to help you refine. Targeting “pizza” means that you’re likely casting too wide a net.

The remaining 75% lie in the “chunky middle” and “long tail” of search.

Don’t underestimate these less popular keywords. Long tail keywords with lower search volume often convert better, because searchers are more specific and intentional in their searches. For example, a person searching for “shoes” is probably just browsing. Whereas, someone searching for “best price red womens size 7 running shoe,” practically has their wallet out!

Pro tip: Questions are SEO gold!

Discovering what questions people are asking in your space, and adding those questions and their answers to an FAQ page, can yield incredible organic traffic for your website.

Getting strategic with search volume

Now that you’ve discovered relevant search terms for your site and their corresponding search volumes, you can get even more strategic by looking at your competitors and figuring out how searches might differ by season or location.

Keywords by competitor

You’ll likely compile a lot of keywords. How do you know which to tackle first? It could be a good idea to prioritize high-volume keywords that your competitors are not currently ranking for. On the flip side, you could also see which keywords from your list your competitors are already ranking for and prioritize those. The former is great when you want to take advantage of your competitors’ missed opportunities, while the latter is an aggressive strategy that sets you up to compete for keywords your competitors are already performing well for.

Keywords by season

Knowing about seasonal trends can be advantageous in setting a content strategy. For example, if you know that “christmas box” starts to spike in October through December in the United Kingdom, you can prepare content months in advance and give it a big push around those months.

Keywords by region

You can more strategically target a specific location by narrowing down your keyword research to specific towns, counties, or states in the Google Keyword Planner, or evaluate “interest by subregion” in Google Trends. Geo-specific research can help make your content more relevant to your target audience. For example, you might find out that in Texas, the preferred term for a large truck is “big rig,” while in New York, “tractor trailer” is the preferred terminology.

Which format best suits the searcher’s intent?

In Chapter 2, we learned about SERP features. That background is going to help us understand how searchers want to consume information for a particular keyword. The format in which Google chooses to display search results depends on intent, and every query has a unique one. While there are thousands of of possible search types, there are five major categories to be aware of:

1. Informational queries: The searcher needs information, such as the name of a band or the height of the Empire State Building.

2. Navigational queries: The searcher wants to go to a particular place on the Internet, such as Facebook or the homepage of the NFL.

3. Transactional queries: The searcher wants to do something, such as buy a plane ticket or listen to a song.

4. Commercial investigation: The searcher wants to compare products and find the best one for their specific needs.

5. Local queries: The searcher wants to find something locally, such as a nearby coffee shop, doctor, or music venue.

An important step in the keyword research process is surveying the SERP landscape for the keyword you want to target in order to get a better gauge of searcher intent. If you want to know what type of content your target audience wants, look to the SERPs!

Google has closely evaluated the behavior of trillions of searches in an attempt to provide the most desired content for each specific keyword search.

Take the search “dresses,” for example:

By the shopping carousel, you can infer that Google has determined many people who search for “dresses” want to shop for dresses online.

There is also a Local Pack feature for this keyword, indicating Google’s desire to help searchers who may be looking for local dress retailers.

If the query is ambiguous, Google will also sometimes include the “refine by” feature to help searchers specify what they’re looking for further. By doing so, the search engine can provide results that better help the searcher accomplish their task.

Google has a wide array of result types it can serve up depending on the query, so if you’re going to target a keyword, look to the SERP to understand what type of content you need to create.

Tools for determining the value of a keyword

How much value would a keyword add to your website? These tools can help you answer that question, so they’d make great additions to your keyword research arsenal:

  • Moz Keyword Explorer – Our own Moz Keyword Explorer tool extracts accurate search volume data, keyword difficulty, and keyword opportunity metrics by using live clickstream data. To learn more about how we’re producing our keyword data, check out Announcing Keyword Explorer.
  • Google Keyword Planner – Google’s AdWords Keyword Planner has historically been the most common starting point for SEO keyword research. However, Keyword Planner does restrict search volume data by lumping keywords together into large search volume range buckets. To learn more, check out Google Keyword Planner’s Dirty Secrets.
  • Google Trends – Google’s keyword trend tool is great for finding seasonal keyword fluctuations. For example, “funny halloween costume ideas” will peak in the weeks before Halloween.
  • AnswerThePublic – This free tool populates commonly searched for questions around a specific keyword. Bonus! You can use this tool in tandem with another free tool, Keywords Everywhere, to prioritize ATP’s suggestions by search volume.
  • SpyFu Keyword Research Tool – Provides some really neat competitive keyword data.

Download our free keyword research template!

Keyword research can yield a ton of data. Stay organized by downloading our free keyword research template. Customize the template to fit your unique needs. Happy keyword researching!

Now that you know how to uncover what your target audience is searching for and how often, it’s time to move onto the next step: crafting pages in a way that users will love and search engines can understand.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

Posted in IM NewsComments Off

Measuring the quality of popular keyword research tools

Contributor JR Oakes measures the quality of popular keyword research tools against data found in Google search results and performing page data from Google Search Console.

Please visit Search Engine Land for the full article.

Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off