Tag Archive | "Absolute"

Advanced Linkbuilding: How to Find the Absolute Best Publishers and Writers to Pitch

Posted by KristinTynski

In my last post, I explained how using network visualization tools can help you massively improve your content marketing PR/Outreach strategy —understanding which news outlets have the largest syndication networks empowers your outreach team to prioritize high-syndication publications over lower syndication publications. The result? The content you are pitching enjoys significantly more widespread link pickups.

Today, I’m going to take you a little deeper — we’ll be looking at a few techniques for forming an even better understanding of the publisher syndication networks in your particular niche. I’ve broken this technique into two parts:

  • Technique One — Leveraging Buzzsumo influencer data and twitter scraping to find the most influential journalists writing about any topic
  • Technique Two — Leveraging the Gdelt Dataset to reveal deep story syndication networks between publishers using in-context links.

Why do this at all?

If you are interested in generating high-value links at scale, these techniques provide an undeniable competitive advantage — they help you to deeply understand how writers and news publications connect and syndicate to each other.

In our opinion at Fractl, data-driven content stories that have strong news hooks, finding writers and publications who would find the content compelling, and pitching them effectively is the single highest ROI SEO activity possible. Done correctly, it is entirely possible to generate dozens, sometimes even hundreds or thousands, of high-authority links with one or a handful of content campaigns.

Let’s dive in.

Using Buzzsumo to understand journalist influencer networks on any topic

First, you want to figure out who your topc influencers are your a topic. A very handy feature of Buzzsumo is its “influencers” tool. You can locate it on the influences tab, then follow these steps:

  • Select only “Journalists.” This will limit the result to only the Twitter accounts of those known to be reporters and journalists of major publications. Bloggers and lower authority publishers will be excluded.
  • Search using a topical keyword. If it is straightforward, one or two searches should be fine. If it is more complex, create a few related queries, and collate the twitter accounts that appear in all of them. Alternatively, use the Boolean “and/or” in your search to narrow your result. It is critical to be sure your search results are returning journalists that as closely match your target criteria as possible.
  • Ideally, you want at least 100 results. More is generally better, so long as you are sure the results represent your target criteria well.
  • Once you are happy with your search result, click export to grab a CSV.

The next step is to grab all of the people each of these known journalist influencers follows — the goal is to understand which of these 100 or so influencers impacts the other 100 the most. Additionally, we want to find people outside of this group that many of these 100 follow in common.

To do so, we leveraged Twint, a handy Twitter scraper available on Github to pull all of the people each of these journalist influencers follow. Using our scraped data, we built an edge list, which allowed us to visualize the result in  Gephi.

Here is an interactive version for you to explore, and here is a screenshot of what it looks like:

This graph shows us which nodes (influencers) have the most In-Degree links. In other words: it tells us who, of our media influencers, is most followed. 

    These are the top 10 nodes:

    • Maia Szalavitz (@maiasz) Neuroscience Journalist, VICE and TIME
    • Radley Balko (@radleybalko) Opinion journalist, Washington Post
    • Johann Hari (@johannhari101) New York Times best-selling author
    • David Kroll (@davidkroll) Freelance healthcare writer, Forbes Heath
    • Max Daly (@Narcomania) Global Drugs Editor, VICE
    • Dana Milbank (@milbank)Columnist, Washington Post
    • Sam Quinones (@samquinones7), Author
    • Felice Freyer (@felicejfreyer), Boston Globe Reporter, Mental health and Addiction
    • Jeanne Whalen (@jeannewhalen) Business Reporter, Washington Post
    • Eric Bolling (@ericbolling) New York Times best-selling author

    Who is the most influential?

      Using the “Betweenness Centrality” score given by Gephi, we get a rough understanding of which nodes (influencers) in the network act as hubs of information transfer. Those with the highest “Betweenness Centrality” can be thought of as the “connectors” of the network. These are the top 10 influencers:\

      • Maia Szalavitz (@maiasz) Neuroscience Journalist, VICE and TIME
      • David Kroll (@davidkroll) Freelance healthcare writer, Forbes Heath
      • Jeanne Whalen (@jeannewhalen) Business Reporter, Washington Post
      • Travis Lupick (@tlupick), Journalist, Author
      • Johann Hari (@johannhari101) New York Times best-selling author
      • Radley Balko (@radleybalko) Opinion journalist, Washington Post
      • Sam Quinones (@samquinones7), Author
      • Eric Bolling (@ericbolling) New York Times best-selling author
      • Dana Milbank (@milbank)Columnist, Washington Post
      • Mike Riggs (@mikeriggs) Writer & Editor, Reason Mag 

          @maiasz, @davidkroll, and @johannhari101 are standouts. There’s considerable overlap between the winners in “In-Degree” and “Betweenness Centrality” but they are still quite different. 

            What else can we learn?

              The middle of the visualization holds many of the largest sized nodes. The nodes in this view are sized by “In-Degree.” The large, centrally located nodes are disproportionately followed by other members of the graph and enjoy popularity across the board (from many of the other influential nodes). These are journalists commonly followed by everyone else. Sifting through these centrally located nodes will surface many journalists who behave as influencers of the group initially pulled from BuzzSumo.

              So, if you had a campaign about a niche topic, you could consider pitching to an influencer surfaced from this data —according to our the visualization, an article shared in their network would have the most reach and potential ROI

              Using Gdelt to find the most influential websites on a topic with in-context link analysis

              The first example was a great way to find the best journalists in a niche to pitch to, but top journalists are often the most pitched to overall. Often times, it can be easier to get a pickup from less known writers at major publications. For this reason, understanding which major publishers are most influential, and enjoy the widest syndication on a specific theme, topic, or beat, can be majorly helpful.

              By using Gdelt’s massive and fully comprehensive database of digital news stories, along with Google BigQuery and Gephi, it is possible to dig even deeper to yield important strategic information that will help you prioritize your content pitching.

              We pulled all of the articles in Gdelt’s database that are known to be about a specific theme within a given timeframe. In this case (as with the previous example) we looked at “behaviour health.” For each article we found in Gdelt’s database that matches our criteria, we also grabbed links found only within the context of the article.

              Here is how it is done:

              • Connect to Gdelt on Google BigQuery — you can find a tutorial here.
              • Pull data from Gdelt. You can use this command: SELECT DocumentIdentifier,V2Themes,Extras,SourceCommonName,DATE FROM [gdelt-bq:gdeltv2.gkg] where (V2Themes like ‘%Your Theme%’).
              • Select any theme you find, here — just replace the part between the percentages.
              • To extract the links found in each article and build an edge file. This can be done with a relatively simple python script to pull out all of the <PAGE_LINKS> from the results of the query, clean the links to only show their root domain (not the full URL) and put them into an edge file format.

              Note: The edge file is made up of Source–>Target pairs. The Source is the article and the Target are the links found within the article. The edge list will look like this:

              • Article 1, First link found in the article.
              • Article 1, Second link found in the article.
              • Article 2, First link found in the article.
              • Article 2, Second link found in the article.
              • Article 2, Third link found in the article.

              From here, the edge file can be used to build a network visualization where the nodes publishers and the edges between them represent the in-context links found from our Gdelt data pull around whatever topic we desired.

              This final visualization is a network representation of the publishers who have written stories about addiction, and where those stories link to.

                What can we learn from this graph?

                This tells us which nodes (Publisher websites) have the most In-Degree links. In other words: who is the most linked. We can see that the most linked-to for this topic are:

                • tmz.com
                • people.com
                • cdc.gov
                • cnn.com
                • go.com
                • nih.gov
                • ap.org
                • latimes.com
                • jamanetwork.com
                • nytimes.com

                Which publisher is most influential? 

                Using the “Betweenness Centrality” score given by Gephi, we get a rough understanding of which nodes (publishers) in the network act as hubs of information transfer. The nodes with the highest “Betweenness Centrality” can be thought of as the “connectors” of the network. Getting pickups from these high-betweenness centrality nodes gives a much greater likelihood of syndication for that specific topic/theme. 

                • Dailymail.co.uk
                • Nytimes.com
                • People.com
                • CNN.com
                • Latimes.com
                • washingtonpost.com
                • usatoday.com
                • cvslocal.com
                • huffingtonpost.com
                • sfgate.com

                What else can we learn?

                  Similar to the first example, the higher the betweenness centrality numbers, number of In-degree links, and the more centrally located in the graph, the more “important” that node can generally be said to be. Using this as a guide, the most important pitching targets can be easily identified. 

                  Understanding some of the edge clusters gives additional insights into other potential opportunities. Including a few clusters specific to different regional or state local news, and a few foreign language publication clusters.

                  Wrapping up

                  I’ve outlined two different techniques we use at Fractl to understand the influence networks around specific topical areas, both in terms of publications and the writers at those publications. The visualization techniques described are not obvious guides, but instead, are tools for combing through large amounts of data and finding hidden information. Use these techniques to unearth new opportunities and prioritize as you get ready to find the best places to pitch the content you’ve worked so hard to create.

                  Do you have any similar ideas or tactics to ensure you’re pitching the best writers and publishers with your content? Comment below!

                    Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


                    Moz Blog

                    Posted in IM NewsComments Off

                    Should I Use Relative or Absolute URLs? – Whiteboard Friday

                    Posted by RuthBurrReedy

                    It was once commonplace for developers to code relative URLs into a site. There are a number of reasons why that might not be the best idea for SEO, and in today’s Whiteboard Friday, Ruth Burr Reedy is here to tell you all about why.

                    Relative vs Absolute URLs Whiteboard

                    For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

                    Let’s discuss some non-philosophical absolutes and relatives

                    Howdy, Moz fans. My name is Ruth Burr Reedy. You may recognize me from such projects as when I used to be the Head of SEO at Moz. I’m now the Senior SEO Manager at BigWing Interactive in Oklahoma City. Today we’re going to talk about relative versus absolute URLs and why they are important.

                    At any given time, your website can have several different configurations that might be causing duplicate content issues. You could have just a standard http://www.example.com. That’s a pretty standard format for a website.

                    But the main sources that we see of domain level duplicate content are when the non-www.example.com does not redirect to the www or vice-versa, and when the HTTPS versions of your URLs are not forced to resolve to HTTP versions or, again, vice-versa. What this can mean is if all of these scenarios are true, if all four of these URLs resolve without being forced to resolve to a canonical version, you can, in essence, have four versions of your website out on the Internet. This may or may not be a problem.

                    It’s not ideal for a couple of reasons. Number one, duplicate content is a problem because some people think that duplicate content is going to give you a penalty. Duplicate content is not going to get your website penalized in the same way that you might see a spammy link penalty from Penguin. There’s no actual penalty involved. You won’t be punished for having duplicate content.

                    The problem with duplicate content is that you’re basically relying on Google to figure out what the real version of your website is. Google is seeing the URL from all four versions of your website. They’re going to try to figure out which URL is the real URL and just rank that one. The problem with that is you’re basically leaving that decision up to Google when it’s something that you could take control of for yourself.

                    There are a couple of other reasons that we’ll go into a little bit later for why duplicate content can be a problem. But in short, duplicate content is no good.

                    However, just having these URLs not resolve to each other may or may not be a huge problem. When it really becomes a serious issue is when that problem is combined with injudicious use of relative URLs in internal links. So let’s talk a little bit about the difference between a relative URL and an absolute URL when it comes to internal linking.

                    With an absolute URL, you are putting the entire web address of the page that you are linking to in the link. You’re putting your full domain, everything in the link, including /page. That’s an absolute URL.

                    However, when coding a website, it’s a fairly common web development practice to instead code internal links with what’s called a relative URL. A relative URL is just /page. Basically what that does is it relies on your browser to understand, “Okay, this link is pointing to a page that’s on the same domain that we’re already on. I’m just going to assume that that is the case and go there.”

                    There are a couple of really good reasons to code relative URLs

                    1) It is much easier and faster to code.

                    When you are a web developer and you’re building a site and there thousands of pages, coding relative versus absolute URLs is a way to be more efficient. You’ll see it happen a lot.

                    2) Staging environments

                    Another reason why you might see relative versus absolute URLs is some content management systems — and SharePoint is a great example of this — have a staging environment that’s on its own domain. Instead of being example.com, it will be examplestaging.com. The entire website will basically be replicated on that staging domain. Having relative versus absolute URLs means that the same website can exist on staging and on production, or the live accessible version of your website, without having to go back in and recode all of those URLs. Again, it’s more efficient for your web development team. Those are really perfectly valid reasons to do those things. So don’t yell at your web dev team if they’ve coded relative URLS, because from their perspective it is a better solution.

                    Relative URLs will also cause your page to load slightly faster. However, in my experience, the SEO benefits of having absolute versus relative URLs in your website far outweigh the teeny-tiny bit longer that it will take the page to load. It’s very negligible. If you have a really, really long page load time, there’s going to be a whole boatload of things that you can change that will make a bigger difference than coding your URLs as relative versus absolute.

                    Page load time, in my opinion, not a concern here. However, it is something that your web dev team may bring up with you when you try to address with them the fact that, from an SEO perspective, coding your website with relative versus absolute URLs, especially in the nav, is not a good solution.

                    There are even better reasons to use absolute URLs

                    1) Scrapers

                    If you have all of your internal links as relative URLs, it would be very, very, very easy for a scraper to simply scrape your whole website and put it up on a new domain, and the whole website would just work. That sucks for you, and it’s great for that scraper. But unless you are out there doing public services for scrapers, for some reason, that’s probably not something that you want happening with your beautiful, hardworking, handcrafted website. That’s one reason. There is a scraper risk.

                    2) Preventing duplicate content issues

                    But the other reason why it’s very important to have absolute versus relative URLs is that it really mitigates the duplicate content risk that can be presented when you don’t have all of these versions of your website resolving to one version. Google could potentially enter your site on any one of these four pages, which they’re the same page to you. They’re four different pages to Google. They’re the same domain to you. They are four different domains to Google.

                    But they could enter your site, and if all of your URLs are relative, they can then crawl and index your entire domain using whatever format these are. Whereas if you have absolute links coded, even if Google enters your site on www. and that resolves, once they crawl to another page, that you’ve got coded without the www., all of that other internal link juice and all of the other pages on your website, Google is not going to assume that those live at the www. version. That really cuts down on different versions of each page of your website. If you have relative URLs throughout, you basically have four different websites if you haven’t fixed this problem.

                    Again, it’s not always a huge issue. Duplicate content, it’s not ideal. However, Google has gotten pretty good at figuring out what the real version of your website is.

                    You do want to think about internal linking, when you’re thinking about this. If you have basically four different versions of any URL that anybody could just copy and paste when they want to link to you or when they want to share something that you’ve built, you’re diluting your internal links by four, which is not great. You basically would have to build four times as many links in order to get the same authority. So that’s one reason.

                    3) Crawl Budget

                    The other reason why it’s pretty important not to do is because of crawl budget. I’m going to point it out like this instead.

                    When we talk about crawl budget, basically what that is, is every time Google crawls your website, there is a finite depth that they will. There’s a finite number of URLs that they will crawl and then they decide, “Okay, I’m done.” That’s based on a few different things. Your site authority is one of them. Your actual PageRank, not toolbar PageRank, but how good Google actually thinks your website is, is a big part of that. But also how complex your site is, how often it’s updated, things like that are also going to contribute to how often and how deep Google is going to crawl your site.

                    It’s important to remember when we think about crawl budget that, for Google, crawl budget cost actual dollars. One of Google’s biggest expenditures as a company is the money and the bandwidth it takes to crawl and index the Web. All of that energy that’s going into crawling and indexing the Web, that lives on servers. That bandwidth comes from servers, and that means that using bandwidth cost Google actual real dollars.

                    So Google is incentivized to crawl as efficiently as possible, because when they crawl inefficiently, it cost them money. If your site is not efficient to crawl, Google is going to save itself some money by crawling it less frequently and crawling to a fewer number of pages per crawl. That can mean that if you have a site that’s updated frequently, your site may not be updating in the index as frequently as you’re updating it. It may also mean that Google, while it’s crawling and indexing, may be crawling and indexing a version of your website that isn’t the version that you really want it to crawl and index.

                    So having four different versions of your website, all of which are completely crawlable to the last page, because you’ve got relative URLs and you haven’t fixed this duplicate content problem, means that Google has to spend four times as much money in order to really crawl and understand your website. Over time they’re going to do that less and less frequently, especially if you don’t have a really high authority website. If you’re a small website, if you’re just starting out, if you’ve only got a medium number of inbound links, over time you’re going to see your crawl rate and frequency impacted, and that’s bad. We don’t want that. We want Google to come back all the time, see all our pages. They’re beautiful. Put them up in the index. Rank them well. That’s what we want. So that’s what we should do.

                    There are couple of ways to fix your relative versus absolute URLs problem

                    1) Fix what is happening on the server side of your website

                    You have to make sure that you are forcing all of these different versions of your domain to resolve to one version of your domain. For me, I’m pretty agnostic as to which version you pick. You should probably already have a pretty good idea of which version of your website is the real version, whether that’s www, non-www, HTTPS, or HTTP. From my view, what’s most important is that all four of these versions resolve to one version.

                    From an SEO standpoint, there is evidence to suggest and Google has certainly said that HTTPS is a little bit better than HTTP. From a URL length perspective, I like to not have the www. in there because it doesn’t really do anything. It just makes your URLs four characters longer. If you don’t know which one to pick, I would pick one this one HTTPS, no W’s. But whichever one you pick, what’s really most important is that all of them resolve to one version. You can do that on the server side, and that’s usually pretty easy for your dev team to fix once you tell them that it needs to happen.

                    2) Fix your internal links

                    Great. So you fixed it on your server side. Now you need to fix your internal links, and you need to recode them for being relative to being absolute. This is something that your dev team is not going to want to do because it is time consuming and, from a web dev perspective, not that important. However, you should use resources like this Whiteboard Friday to explain to them, from an SEO perspective, both from the scraper risk and from a duplicate content standpoint, having those absolute URLs is a high priority and something that should get done.

                    You’ll need to fix those, especially in your navigational elements. But once you’ve got your nav fixed, also pull out your database or run a Screaming Frog crawl or however you want to discover internal links that aren’t part of your nav, and make sure you’re updating those to be absolute as well.

                    Then you’ll do some education with everybody who touches your website saying, “Hey, when you link internally, make sure you’re using the absolute URL and make sure it’s in our preferred format,” because that’s really going to give you the most bang for your buck per internal link. So do some education. Fix your internal links.

                    Sometimes your dev team going to say, “No, we can’t do that. We’re not going to recode the whole nav. It’s not a good use of our time,” and sometimes they are right. The dev team has more important things to do. That’s okay.

                    3) Canonicalize it!

                    If you can’t get your internal links fixed or if they’re not going to get fixed anytime in the near future, a stopgap or a Band-Aid that you can kind of put on this problem is to canonicalize all of your pages. As you’re changing your server to force all of these different versions of your domain to resolve to one, at the same time you should be implementing the canonical tag on all of the pages of your website to self-canonize. On every page, you have a canonical page tag saying, “This page right here that they were already on is the canonical version of this page. ” Or if there’s another page that’s the canonical version, then obviously you point to that instead.

                    But having each page self-canonicalize will mitigate both the risk of duplicate content internally and some of the risk posed by scrappers, because when they scrape, if they are scraping your website and slapping it up somewhere else, those canonical tags will often stay in place, and that lets Google know this is not the real version of the website.

                    In conclusion, relative links, not as good. Absolute links, those are the way to go. Make sure that you’re fixing these very common domain level duplicate content problems. If your dev team tries to tell you that they don’t want to do this, just tell them I sent you. Thanks guys.

                    Video transcription by Speechpad.com

                    Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


                    Moz Blog

                    Posted in IM NewsComments Off


                    Advert