Tag Archive | "Bots"

AI-Powered Conversational Bots Are Changing the Game, Says LivePerson CEO

“T-Mobile literally pulled the hold technology out,” says LivePerson CEO Robert Locascio. “Millions of customer at T-Mobile don’t have to be put on hold. There’s no press one or press two. They go straight to a person. You are messaging them. You are doing what’s natural to you. That’s really what we see as changing the game. We made a big pivot two years ago and launched a whole new platform. There’s a bigger future in bots and AI than there was in chat.”

Robert Locascio, CEO of LivePerson, discusses how AI-powered conversational bots are being deployed and literally “changing the game” for customers of thousands of businesses using their Maven technology in an interview with Jim Cramer on CNBC:

AI-Powered Conversational Bots Are Changing the Game

We made a big pivot two years ago and launched a whole new platform and I said, “There’s a bigger future in bots and AI than there was in chat that I invented many years ago.” We went for it and as you can see the performance has been really great. I brought Alex Spinelli in about a year ago and he was running the core development team for Alexa. We brought in a lot of people from that group. The difference between us and Alexa is that we have thousands of brands (for our team) to work on. The Delta’s of the world and the T-Mobile’s of the world, instead of just one brand with Amazon.

We do human interactions also, but we know a lot of those interactions can be automated. Just look at Delta. In a couple of weeks, instead of your flights late and you make a call on the phone and get put on hold, you are going to be able to message a contact center and talk to a bot in real-time to get what you want and change your flight. All of that will happen without you being on hold. That’s really why these brands are gravitating toward us. We are messaging with our friends and family. We are not calling people anymore. So why call a brand?

T-Mobile literally pulled the hold technology out. Millions of customer at T-Mobile don’t have to be put on hold. There’s no press one or press two. They go straight to a person. You are messaging them. You are doing what’s natural to you. That’s really what we see as changing the game. Right now, Apple just opened up iMessages to businesses. Every business in the world is going to have to be on iMessage through our platform. Facebook Messenger too.

Maven, LivePerson’s Conversational Engine

We now have this thing called Brew to You, where right from your seat (in a stadium) you can have a bear and a hot dog delivered to you. But now we have something really cool which is out of the Cosmopolitan Hotel in Las Vegas. There is a bot called Rose when you check in. She tells you everything about the hotel. She can help you cut the line at Marquee which is their cool club. This is all about people engaging with the brand and talking to this bot that’s just there for you. And after people leave the hotel they keep talking to Rose!

What we are finding is that we take our technology which is called Maven, we enable the contact center reps to create the bots, deploy them, and own the bots. For example, we have a contact center down in the Dominican Republic and there’s a woman there named Laura that created a bot for GrillMaster, which is one of our customers. They deployed it and sold millions of dollars worth of grills. She was empowered to basically create that bot, deploy it, and change her life. She doubled her salary. That’s the power of this thing.

AI Has Got to be Democratized

EqualAI is a nonprofit we set up a couple of months ago. I started to realize that AI has got to be out there in the hands of many. It’s got to be democratized. It can’t just be with the big tech companies. What we want to do is take all the technology that we have (and make it available). It started with watching my two-year-old watching me command Alexa. Alexa turn on the lights. Alexa play music. She’s seen me command this AI and it’s a woman’s voice. I think what we are seeing now is that children are being affected by this. They are going to school, making demands, and following this.

We have to change the way that we deploy AI and how we manage it. I wanted to bring the best practices into a nonprofit. We now have other people and brands who are joining us and taking part in this. One of the best practices that we are looking at is why do we have a woman’s voice with Alexa? It could be any voice. It could be a man. We have to think about these things before we deploy them to millions of people and we affect their lives.

AI-Powered Conversational Bots Are Changing the Game, Says LivePerson CEO Robert Locascio

Also read:

Deepak Chopra Delivering Reflections on Alexa via LivePerson

The post AI-Powered Conversational Bots Are Changing the Game, Says LivePerson CEO appeared first on WebProNews.


WebProNews

Posted in IM NewsComments Off

Democrats Created Fake Russian Twitter Bots to Influence Election

The New York Times reported that Democrats created fake Russian Twitter bots in a disinformation campaign, also known as fake news, in order to influence voters to influence the Senate election in Alabama.

The campaign was reportedly funded by liberal billionaire Reid Hoffman and included the creation of more than a thousand Russian-language accounts that followed Republican Senate candidate Roy Moore’s Twitter account. This was picked up by major media outlets in order to fool the public into thinking the Russians supported Moore.

Robert Siciliano, security expert and CEO of Safr.me, recently discussed the Democrat’s Russian disinformation campaign on Fox Business:

Democrats Created Fake Russian Twitter Profiles

What it seems like is a number of Twitter profiles were created looking like they were of Russian descent and they were following Roy Moore. Some in the media picked it up at the time. I believe it was USA Today and The Alabamian. They pointed it out and it made news designed to make Roy Moore look bad.

It seems with the political climate that we are in right now that any relation with Russia is a bad one and if you are supported by the Russians you must be bad as well.

Disinformation Campaign Via Fake Russian Twitter Bots

Creating a Twitter bot is not too difficult to do. Anyone with a computer can make it happen. You can do a quick Google search of ‘how to create a Twitter bot’ and anyone can engage in that process.

This methodology of thwarting a potential election can and will be used again in the future simply as a disinformation campaign. The disinformation that social media allows makes it very easy for this type of information to soil someone’s reputation.

There is an All-Out Assault Online Today

There is an all-out assault online today. Anyone connected to the web and with a presence needs to monitor their online reputation and how it’s being manipulated in a number of different ways.

The post Democrats Created Fake Russian Twitter Bots to Influence Election appeared first on WebProNews.


WebProNews

Posted in IM NewsComments Off

The SEO Cyborg: How to Resonate with Users & Make Sense to Search Bots

Posted by alexis-sanders

SEO is about understanding how search bots and users react to an online experience. As search professionals, we’re required to bridge gaps between online experiences, search engine bots, and users. We need to know where to insert ourselves (or our teams) to ensure the best experience for both users and bots. In other words, we strive for experiences that resonate with humans and make sense to search engine bots.

This article seeks to answer the following questions:

  • How do we drive sustainable growth for our clients?
  • What are the building blocks of an organic search strategy?

What is the SEO cyborg?

A cyborg (or cybernetic organism) is defined as “a being with both organic and
biomechatronic body parts, whose physical abilities are extended beyond normal human limitations by mechanical elements.”

With the ability to relate between humans, search bots, and our site experiences, the SEO cyborg is an SEO (or team) that is able to work seamlessly between both technical and content initiatives (whose skills are extended beyond normal human limitations) to support driving of organic search performance. An SEO cyborg is able to strategically pinpoint where to place organic search efforts to maximize performance.

So, how do we do this?

The SEO model

Like so many classic triads (think: primary colors, the Three Musketeers, Destiny’s Child [the canonical version, of course]) the traditional SEO model, known as the crawl-index-rank method, packages SEO into three distinct steps. At the same time, however, this model fails to capture the breadth of work that we SEOs are expected to do on a daily basis, and not having a functioning model can be limiting. We need to expand this model without reinventing the wheel.

The enhanced model involves adding in a rendering, signaling, and connection phase.

You might be wondering, why do we need these?:

  • Rendering: There is increased prevalence of JavaScript, CSS, imagery, and personalization.
  • Signaling: HTML <link> tags, status codes, and even GSC signals are powerful indicators that tell search engines how to process and understand the page, determine its intent, and ultimately rank it. In the previous model, it didn’t feel as if these powerful elements really had a place.
  • Connecting: People are a critical component of search. The ultimate goal of search engines is to identify and rank content that resonates with people. In the previous model, “rank” felt cold, hierarchical, and indifferent towards the end user.

All of this brings us to the question: how do we find success in each stage of this model?

Note: When using this piece, I recommend skimming ahead and leveraging those sections of the enhanced model that are most applicable to your business’ current search program.

The enhanced SEO model

Crawling

Technical SEO starts with the search engine’s ability to find a site’s webpages (hopefully efficiently).

Finding pages

Initially finding pages can happen a few ways, via:

  • Links (internal or external)
  • Redirected pages
  • Sitemaps (XML, RSS 2.0, Atom 1.0, or .txt)

Side note: This information (although at first pretty straightforward) can be really useful. For example, if you’re seeing weird pages popping up in site crawls or performing in search, try checking:

  • Backlink reports
  • Internal links to URL
  • Redirected into URL

Obtaining resources

The second component of crawling relates to the ability to obtain resources (which later becomes critical for rendering a page’s experience).

This typically relates to two elements:

  1. Appropriate robots.txt declarations
  2. Proper HTTP status code (namely 200 HTTP status codes)

Crawl efficiency

Finally, there’s the idea of how efficiently a search engine bot can traverse your site’s most critical experiences.

Action items:

  • Is site’s main navigation simple, clear, and useful?
  • Are there relevant on-page links?
  • Is internal linking clear and crawlable (i.e., <a href=”/”>)?
  • Is an HTML sitemap available?
    • Side note: Make sure to check the HTML sitemap’s next page flow (or behavior flow reports) to find where those users are going. This may help to inform the main navigation.
  • Do footer links contain tertiary content?
  • Are important pages close to root?
  • Are there no crawl traps?
  • Are there no orphan pages?
  • Are pages consolidated?
  • Do all pages have purpose?
  • Has duplicate content been resolved?
  • Have redirects been consolidated?
  • Are canonical tags on point?
  • Are parameters well defined?

Information architecture

The organization of information extends past the bots, requiring an in-depth understanding of how users engage with a site.

Some seed questions to begin research include:

  • What trends appear in search volume (by location, device)? What are common questions users have?
  • Which pages get the most traffic?
  • What are common user journeys?
  • What are users’ traffic behaviors and flow?
  • How do users leverage site features (e.g., internal site search)?

Rendering

Rendering a page relates to search engines’ ability to capture the page’s desired essence.

JavaScript

The big kahuna in the rendering section is JavaScript. For Google, rendering of JavaScript occurs during a second wave of indexing and the content is queued and rendered as resources become available.

Image based off of Google I/O ’18 presentation by Tom Greenway and John Mueller, Deliver search-friendly JavaScript-powered websites

As an SEO, it’s critical that we be able to answer the question — are search engines rendering my content?

Action items:

  • Are direct “quotes” from content indexed?
  • Is the site using <a href=”/”> links (not onclick();)?
  • Is the same content being served to search engine bots (user-agent)?
  • Is the content present within the DOM?
  • What does Google’s Mobile-Friendly Testing Tool’s JavaScript console (click “view details”) say?

Infinite scroll and lazy loading

Another hot topic relating to JavaScript is infinite scroll (and lazy load for imagery). Since search engine bots are lazy users, they won’t scroll to attain content.

Action items:

Ask ourselves – should all of the content really be indexed? Is it content that provides value to users?

  • Infinite scroll: a user experience (and occasionally a performance optimizing) tactic to load content when the user hits a certain point in the UI; typically the content is exhaustive.

Solution one (updating AJAX):

1. Break out content into separate sections

  • Note: The breakout of pages can be /page-1, /page-2, etc.; however, it would be best to delineate meaningful divides (e.g., /voltron, /optimus-prime, etc.)

2. Implement History API (pushState(), replaceState()) to update URLs as a user scrolls (i.e., push/update the URL into the URL bar)

3. Add the <link> tag’s rel=”next” and rel=”prev” on relevant page

Solution two (create a view-all page)
Note: This is not recommended for large amounts of content.

1. If it’s possible (i.e., there’s not a ton of content within the infinite scroll), create one page encompassing all content

2. Site latency/page load should be considered

  • Lazy load imagery is a web performance optimization tactic, in which images loads upon a user scrolling (the idea is to save time, downloading images only when they’re needed)
  • Add <img> tags in <noscript> tags
  • Use JSON-LD structured data
    • Schema.org “image” attributes nested in appropriate item types
    • Schema.org ImageObject item type

CSS

I only have a few elements relating to the rendering of CSS.

Action items:

  • CSS background images not picked up in image search, so don’t count on for important imagery
  • CSS animations not interpreted, so make sure to add surrounding textual content
  • Layouts for page are important (use responsive mobile layouts; avoid excessive ads)

Personalization

Although a trend in the broader digital exists to create 1:1, people-based marketing, Google doesn’t save cookies across sessions and thus will not interpret personalization based on cookies, meaning there must be an average, base-user, default experience. The data from other digital channels can be exceptionally useful when building out audience segments and gaining a deeper understanding of the base-user.

Action item:

  • Ensure there is a base-user, unauthenticated, default experience

Technology

Google’s rendering engine is leveraging Chrome 41. Canary (Chrome’s testing browser) is currently operating on Chrome 69. Using CanIUse.com, we can infer that this affects Google’s abilities relating to HTTP/2, service workers (think: PWAs), certain JavaScript, specific advanced image formats, resource hints, and new encoding methods. That said, this does not mean we shouldn’t progress our sites and experiences for users — we just must ensure that we use progressive development (i.e., there’s a fallback for less advanced browsers [and Google too ☺]).

Action items:

  • Ensure there’s a fallback for less advanced browsers

Indexing

Getting pages into Google’s databases is what indexing is all about. From what I’ve experienced, this process is straightforward for most sites.

Action items:

  • Ensure URLs are able to be crawled and rendered
  • Ensure nothing is preventing indexing (e.g., robots meta tag)
  • Submit sitemap in Google Search Console
  • Fetch as Google in Google Search Console

Signaling

A site should strive to send clear signals to search engines. Unnecessarily confusing search engines can significantly impact a site’s performance. Signaling relates to suggesting best representation and status of a page. All this means is that we’re ensuring the following elements are sending appropriate signals.

Action items:

  • <link> tag: This represents the relationship between documents in HTML.
    • Rel=”canonical”: This represents appreciably similar content.
      • Are canonicals a secondary solution to 301-redirecting experiences?
      • Are canonicals pointing to end-state URLs?
      • Is the content appreciably similar?
        • Since Google maintains prerogative over determining end-state URL, it’s important that the canonical tags represent duplicates (and/or duplicate content).
      • Are all canonicals in HTML?
      • Is there safeguarding against incorrect canonical tags?
    • Rel=”next” and rel=”prev”: These represent a collective series and are not considered duplicate content, which means that all URLs can be indexed. That said, typically the first page in the chain is the most authoritative, so usually it will be the one to rank.
    • Rel=”alternate”
      • media: typically used for separate mobile experiences.
      • hreflang: indicate appropriate language/country
        • The hreflang is quite unforgiving and it’s very easy to make errors.
        • Ensure the documentation is followed closely.
        • Check GSC International Target reports to ensure tags are populating.
  • HTTP status codes can also be signals, particularly the 304, 404, 410, and 503 status codes.
    • 304 – a valid page that simply hasn’t been modified
    • 404 – file not found
    • 410 – file not found (and it is gone, forever and always)
    • 503 – server maintenance

  • Google Search Console settings: Make sure the following reports are all sending clear signals. Occasionally Google decides to honor these signals.
    • International Targeting
    • URL Parameters
    • Data Highlighter
    • Remove URLs
    • Sitemaps

Rank

Rank relates to how search engines arrange web experiences, stacking them against each other to see who ends up on top for each individual query (taking into account numerous data points surrounding the query).

Two critical questions recur often when understanding ranking pages:

  • Does or could your page have the best response?
  • Are you or could you become semantically known (on the Internet and in the minds of users) for the topics? (i.e., are you worthy of receiving links and people traversing the web to land on your experience?)

On-page optimizations

These are the elements webmasters control. Off-page is a critical component to achieving success in search; however, in an idyllic world, we shouldn’t have to worry about links and/or mentions – they should come naturally.

Action items:

  • Textual content:
    • Make content both people and bots can understand
    • Answer questions directly
    • Write short, logical, simple sentences
    • Ensure subjects are clear (not to be inferred)
    • Create scannable content (i.e., make sure <h#> tags are an outline, use bullets/lists, use tables, charts, and visuals to delineate content, etc.)
    • Define any uncommon vocabulary or link to a glossary
  • Multimedia (images, videos, engaging elements):
    • Use imagery, videos, engaging content where applicable
    • Ensure that image optimization best practices are followed
  • Meta elements (<title> tags, meta descriptions, OGP, Twitter cards, etc.)
  • Structured data

Image courtesy of @abbynhamilton

  • Is content accessible?
    • Is there keyboard functionality?
    • Are there text alternatives for non-text media? Example:
      • Transcripts for audio
      • Images with alt text
      • In-text descriptions of visuals
    • Is there adequate color contrast?
    • Is text resizable?

Finding interesting content

Researching and identifying useful content happens in three formats:

  • Keyword and search landscape research
  • On-site analytic deep dives
  • User research

Visual modified from @smrvl via @DannyProl

Audience research

When looking for audiences, we need to concentrate high percentages (super high index rates are great, but not required). Push channels (particularly ones with strong targeting capabilities) do better with high index rates. This makes sense, we need to know that 80% of our customers have certain leanings (because we’re looking for base-case), not that five users over-index on a niche topic (these five niche-topic lovers are perfect for targeted ads).

Some seed research questions:

  • Who are users?
  • Where are they?
  • Why do they buy?
  • How do they buy?
  • What do they want?
  • Are they new or existing users?
  • What do they value?
  • What are their motivators?
  • What is their relationship w/ tech?
  • What do they do online?
  • Are users engaging with other brands?
    • Is there an opportunity for synergy?
  • What can we borrow from other channels?
    • Digital presents a wealth of data, in which 1:1, closed-loop, people-based marketing exists. Leverage any data you can get and find useful.

Content journey maps

All of this data can then go into creating a map of the user journey and overlaying relevant content. Below are a few types of mappings that are useful.

Illustrative user journey map

Sometimes when trying to process complex problems, it’s easier to break it down into smaller pieces. Illustrative user journeys can help with this problem! Take a single user’s journey and map it out, aligning relevant content experiences.

Funnel content mapping

This chart is deceptively simple; however, working through this graph can help sites to understand how each stage in the funnel affects users (note: the stages can be modified). This matrix can help with mapping who writers are talking to, their needs, and how to push them to the next stage in the funnel.

Content matrix

Mapping out content by intent and branding helps to visualize conversion potential. I find these extremely useful for prioritizing top-converting content initiatives (i.e., start with ensuring branded, transactional content is delivering the best experience, then move towards more generic, higher-funnel terms).

Overviews

Regardless of how the data is broken down, it’s vital to have a high-level view on the audience’s core attributes, opportunities to improve content, and strategy for closing the gap.

Connecting

Connecting is all about resonating with humans. Connecting is about understanding that customers are human (and we have certain constraints). Our mind is constantly filtering, managing, multitasking, processing, coordinating, organizing, and storing information. It is literally in our mind’s best interest to not remember 99% of the information and sensations that surround us (think of the lights, sounds, tangible objects, people surrounding you, and you’re still able to focus on reading the words on your screen — pretty incredible!).

To become psychologically sticky, we must:

  1. Get past the mind’s natural filter. A positive aspect of being a pull marketing channel is that individuals are already seeking out information, making it possible to intersect their user journey in a micro-moment.
  2. From there we must be memorable. The brain tends to hold onto what’s relevant, useful, or interesting. Luckily, the searcher’s interest is already piqued (even if they aren’t consciously aware of why they searched for a particular topic).

This means we have a unique opportunity to “be there” for people. This leads to a very simple, abstract philosophy: a great brand is like a great friend.

We have similar relationship stages, we interweave throughout each other’s lives, and we have the ability to impact happiness. This comes down to the question: Do your online customers use adjectives they would use for a friend to describe your brand?

Action items:

  • Is all content either relevant, useful, or interesting?
  • Does the content honor your user’s questions?
  • Does your brand have a personality that aligns with reality?
  • Are you treating users as you would a friend?
  • Do your users use friend-like adjectives to describe your brand and/or site?
  • Do the brand’s actions align with overarching goals?
  • Is your experience trust-inspiring?
  • https://?
  • Using Limited ads in layout?
  • Does the site have proof of claims?
  • Does the site use relevant reviews and testimonials?
  • Is contact information available and easily findable?
  • Is relevant information intuitively available to users?
  • Is it as easy to buy/subscribe as it is to return/cancel?
  • Is integrity visible throughout the entire conversion process and experience?
  • Does site have credible reputation across the web?

Ultimately, being able to strategically, seamlessly create compelling user experiences which make sense to bots is what the SEO cyborg is all about. ☺

tl;dr

  • Ensure site = crawlable, renderable, and indexable
  • Ensure all signals = clear, aligned
  • Answering related, semantically salient questions
  • Research keywords, the search landscape, site performance, and develop audience segments
  • Use audience segments to map content and prioritize initiatives
  • Ensure content is relevant, useful, or interesting
  • Treat users as friend, be worthy of their trust

This article is based off of my MozCon talk (with a few slides from the Appendix pulled forward). The full deck is available on Slideshare, and the official videos can be purchased here. Please feel free to reach out with any questions in the comments below or via Twitter @AlexisKSanders.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Trust Your Data: How to Efficiently Filter Spam, Bots, & Other Junk Traffic in Google Analytics

Posted by Carlosesal

There is no doubt that Google Analytics is one of the most important tools you could use to understand your users’ behavior and measure the performance of your site. There’s a reason it’s used by millions across the world.

But despite being such an essential part of the decision-making process for many businesses and blogs, I often find sites (of all sizes) that do little or no data filtering after installing the tracking code, which is a huge mistake.

Think of a Google Analytics property without filtered data as one of those styrofoam cakes with edible parts. It may seem genuine from the top, and it may even feel right when you cut a slice, but as you go deeper and deeper you find that much of it is artificial.

If you’re one of those that haven’t properly configured their Google Analytics and you only pay attention to the summary reports, you probably won’t notice that there’s all sorts of bogus information mixed in with your real user data.

And as a consequence, you won’t realize that your efforts are being wasted on analyzing data that doesn’t represent the actual performance of your site.

To make sure you’re getting only the real ingredients and prevent you from eating that slice of styrofoam, I’ll show you how to use the tools that GA provides to eliminate all the artificial excess that inflates your reports and corrupts your data.

Common Google Analytics threats

As most of the people I’ve worked with know, I’ve always been obsessed with the accuracy of data, mainly because as a marketer/analyst there’s nothing worse than realizing that you’ve made a wrong decision because your data wasn’t accurate. That’s why I’m continually exploring new ways of improving it.

As a result of that research, I wrote my first Moz post about the importance of filtering in Analytics, specifically about ghost spam, which was a significant problem at that time and still is (although to a lesser extent).

While the methods described there are still quite useful, I’ve since been researching solutions for other types of Google Analytics spam and a few other threats that might not be as annoying, but that are equally or even more harmful to your Analytics.

Let’s review, one by one.

Ghosts, crawlers, and other types of spam

The GA team has done a pretty good job handling ghost spam. The amount of it has been dramatically reduced over the last year, compared to the outbreak in 2015/2017.

However, the millions of current users and the thousands of new, unaware users that join every day, plus the majority’s curiosity to discover why someone is linking to their site, make Google Analytics too attractive a target for the spammers to just leave it alone.

The same logic can be applied to any widely used tool: no matter what security measures it has, there will always be people trying to abuse its reach for their own interest. Thus, it’s wise to add an extra security layer.

Take, for example, the most popular CMS: WordPress. Despite having some built-in security measures, if you don’t take additional steps to protect it (like setting a strong username and password or installing a security plugin), you run the risk of being hacked.

The same happens to Google Analytics, but instead of plugins, you use filters to protect it.

In which reports can you look for spam?

Spam traffic will usually show as a Referral, but it can appear in any part of your reports, even in unsuspecting places like a language or page title.

Sometimes spammers will try to fool by using misleading URLs that are very similar to known websites, or they may try to get your attention by using unusual characters and emojis in the source name.

Independently of the type of spam, there are 3 things you always should do when you think you found one in your reports:

  1. Never visit the suspicious URL. Most of the time they’ll try to sell you something or promote their service, but some spammers might have some malicious scripts on their site.
  2. This goes without saying, but never install scripts from unknown sites; if for some reason you did, remove it immediately and scan your site for malware.
  3. Filter out the spam in your Google Analytics to keep your data clean (more on that below).

If you’re not sure whether an entry on your report is real, try searching for the URL in quotes (“example.com”). Your browser won’t open the site, but instead will show you the search results; if it is spam, you’ll usually see posts or forums complaining about it.

If you still can’t find information about that particular entry, give me a shout — I might have some knowledge for you.

Bot traffic

A bot is a piece of software that runs automated scripts over the Internet for different purposes.

There are all kinds of bots. Some have good intentions, like the bots used to check copyrighted content or the ones that index your site for search engines, and others not so much, like the ones scraping your content to clone it.

2016 bot traffic report. Source: Incapsula

In either case, this type of traffic is not useful for your reporting and might be even more damaging than spam both because of the amount and because it’s harder to identify (and therefore to filter it out).

It’s worth mentioning that bots can be blocked from your server to stop them from accessing your site completely, but this usually involves editing sensible files that require high technical knowledge, and as I said before, there are good bots too.

So, unless you’re receiving a direct attack that’s skewing your resources, I recommend you just filter them in Google Analytics.

In which reports can you look for bot traffic?

Bots will usually show as Direct traffic in Google Analytics, so you’ll need to look for patterns in other dimensions to be able to filter it out. For example, large companies that use bots to navigate the Internet will usually have a unique service provider.

I’ll go into more detail on this below.

Internal traffic

Most users get worried and anxious about spam, which is normal — nobody likes weird URLs showing up in their reports. However, spam isn’t the biggest threat to your Google Analytics.

You are!

The traffic generated by people (and bots) working on the site is often overlooked despite the huge negative impact it has. The main reason it’s so damaging is that in contrast to spam, internal traffic is difficult to identify once it hits your Analytics, and it can easily get mixed in with your real user data.

There are different types of internal traffic and different ways of dealing with it.

Direct internal traffic

Testers, developers, marketing team, support, outsourcing… the list goes on. Any member of the team that visits the company website or blog for any purpose could be contributing.

In which reports can you look for direct internal traffic?

Unless your company uses a private ISP domain, this traffic is tough to identify once it hits you, and will usually show as Direct in Google Analytics.

Third-party sites/tools

This type of internal traffic includes traffic generated directly by you or your team when using tools to work on the site; for example, management tools like Trello or Asana,

It also considers traffic coming from bots doing automatic work for you; for example, services used to monitor the performance of your site, like Pingdom or GTmetrix.

Some types of tools you should consider:

  • Project management
  • Social media management
  • Performance/uptime monitoring services
  • SEO tools
In which reports can you look for internal third-party tools traffic?

This traffic will usually show as Referral in Google Analytics.

Development/staging environments

Some websites use a test environment to make changes before applying them to the main site. Normally, these staging environments have the same tracking code as the production site, so if you don’t filter it out, all the testing will be recorded in Google Analytics.

In which reports can you look for development/staging environments?

This traffic will usually show as Direct in Google Analytics, but you can find it under its own hostname (more on this later).

Web archive sites and cache services

Archive sites like the Wayback Machine offer historical views of websites. The reason you can see those visits on your Analytics — even if they are not hosted on your site — is that the tracking code was installed on your site when the Wayback Machine bot copied your content to its archive.

One thing is for certain: when someone goes to check how your site looked in 2015, they don’t have any intention of buying anything from your site — they’re simply doing it out of curiosity, so this traffic is not useful.

In which reports can you look for traffic from web archive sites and cache services?

You can also identify this traffic on the hostname report.

A basic understanding of filters

The solutions described below use Google Analytics filters, so to avoid problems and confusion, you’ll need some basic understanding of how they work and check some prerequisites.

Things to consider before using filters:

1. Create an unfiltered view.

Before you do anything, it’s highly recommendable to make an unfiltered view; it will help you track the efficacy of your filters. Plus, it works as a backup in case something goes wrong.

2. Make sure you have the correct permissions.

You will need edit permissions at the account level to create filters; edit permissions at view or property level won’t work.

3. Filters don’t work retroactively.

In GA, aggregated historical data can’t be deleted, at least not permanently. That’s why the sooner you apply the filters to your data, the better.

4. The changes made by filters are permanent!

If your filter is not correctly configured because you didn’t enter the correct expression (missing relevant entries, a typo, an extra space, etc.), you run the risk of losing valuable data FOREVER; there is no way of recovering filtered data.

But don’t worry — if you follow the recommendations below, you shouldn’t have a problem.

5. Wait for it.

Most of the time you can see the effect of the filter within minutes or even seconds after applying it; however, officially it can take up to twenty-four hours, so be patient.

Types of filters

There are two main types of filters: predefined and custom.

Predefined filters are very limited, so I rarely use them. I prefer to use the custom ones because they allow regular expressions, which makes them a lot more flexible.

Within the custom filters, there are five types: exclude, include, lowercase/uppercase, search and replace, and advanced.

Here we will use the first two: exclude and include. We’ll save the rest for another occasion.

Essentials of regular expressions

If you already know how to work with regular expressions, you can jump to the next section.

REGEX (short for regular expressions) are text strings prepared to match patterns with the use of some special characters. These characters help match multiple entries in a single filter.

Don’t worry if you don’t know anything about them. We will use only the basics, and for some filters, you will just have to COPY-PASTE the expressions I pre-built.

REGEX special characters

There are many special characters in REGEX, but for basic GA expressions we can focus on three:

  • ^ The caret: used to indicate the beginning of a pattern,
  • $ The dollar sign: used to indicate the end of a pattern,
  • | The pipe or bar: means “OR,” and it is used to indicate that you are starting a new pattern.

When using the pipe character, you should never ever:

  • Put it at the beginning of the expression,
  • Put it at the end of the expression,
  • Put 2 or more together.

Any of those will mess up your filter and probably your Analytics.

A simple example of REGEX usage

Let’s say I go to a restaurant that has an automatic machine that makes fruit salad, and to choose the fruit, you should use regular expressions.

This super machine has the following fruits to choose from: strawberry, orange, blueberry, apple, pineapple, and watermelon.

To make a salad with my favorite fruits (strawberry, blueberry, apple, and watermelon), I have to create a REGEX that matches all of them. Easy! Since the pipe character “|” means OR I could do this:

  • REGEX 1: strawberry|blueberry|apple|watermelon

The problem with that expression is that REGEX also considers partial matches, and since pineapple also contains “apple,” it would be selected as well… and I don’t like pineapple!

To avoid that, I can use the other two special characters I mentioned before to make an exact match for apple. The caret “^” (begins here) and the dollar sign “$ ” (ends here). It will look like this:

  • REGEX 2: strawberry|blueberry|^apple$ |watermelon

The expression will select precisely the fruits I want.

But let’s say for demonstration’s sake that the fewer characters you use, the cheaper the salad will be. To optimize the expression, I can use the ability for partial matches in REGEX.

Since strawberry and blueberry both contain “berry,” and no other fruit in the list does, I can rewrite my expression like this:

  • Optimized REGEX: berry|^apple$ |watermelon

That’s it — now I can get my fruit salad with the right ingredients, and at a lower price.

3 ways of testing your filter expression

As I mentioned before, filter changes are permanent, so you have to make sure your filters and REGEX are correct. There are 3 ways of testing them:

  • Right from the filter window; just click on “Verify this filter,” quick and easy. However, it’s not the most accurate since it only takes a small sample of data.

  • Using an online REGEX tester; very accurate and colorful, you can also learn a lot from these, since they show you exactly the matching parts and give you a brief explanation of why.

  • Using an in-table temporary filter in GA; you can test your filter against all your historical data. This is the most precise way of making sure you don’t miss anything.

If you’re doing a simple filter or you have plenty of experience, you can use the built-in filter verification. However, if you want to be 100% sure that your REGEX is ok, I recommend you build the expression on the online tester and then recheck it using an in-table filter.

Quick REGEX challenge

Here’s a small exercise to get you started. Go to this premade example with the optimized expression from the fruit salad case and test the first 2 REGEX I made. You’ll see live how the expressions impact the list.

Now make your own expression to pay as little as possible for the salad.

Remember:

  • We only want strawberry, blueberry, apple, and watermelon;
  • The fewer characters you use, the less you pay;
  • You can do small partial matches, as long as they don’t include the forbidden fruits.

Tip: You can do it with as few as 6 characters.

Now that you know the basics of REGEX, we can continue with the filters below. But I encourage you to put “learn more about REGEX” on your to-do list — they can be incredibly useful not only for GA, but for many tools that allow them.

How to create filters to stop spam, bots, and internal traffic in Google Analytics

Back to our main event: the filters!

Where to start: To avoid being repetitive when describing the filters below, here are the standard steps you need to follow to create them:

  1. Go to the admin section in your Google Analytics (the gear icon at the bottom left corner),
  2. Under the View column (master view), click the button “Filters” (don’t click on “All filters“ in the Account column):
  3. Click the red button “+Add Filter” (if you don’t see it or you can only apply/remove already created filters, then you don’t have edit permissions at the account level. Ask your admin to create them or give you the permissions.):
  4. Then follow the specific configuration for each of the filters below.

The filter window is your best partner for improving the quality of your Analytics data, so it will be a good idea to get familiar with it.

Valid hostname filter (ghost spam, dev environments)

Prevents traffic from:

  • Ghost spam
  • Development hostnames
  • Scraping sites
  • Cache and archive sites

This filter may be the single most effective solution against spam. In contrast with other commonly shared solutions, the hostname filter is preventative, and it rarely needs to be updated.

Ghost spam earns its name because it never really visits your site. It’s sent directly to the Google Analytics servers using a feature called Measurement Protocol, a tool that under normal circumstances allows tracking from devices that you wouldn’t imagine that could be traced, like coffee machines or refrigerators.

Real users pass through your server, then the data is sent to GA; hence it leaves valid information. Ghost spam is sent directly to GA servers, without knowing your site URL; therefore all data left is fake. Source: carloseo.com

The spammer abuses this feature to simulate visits to your site, most likely using automated scripts to send traffic to randomly generated tracking codes (UA-0000000-1).

Since these hits are random, the spammers don’t know who they’re hitting; for that reason ghost spam will always leave a fake or (not set) host. Using that logic, by creating a filter that only includes valid hostnames all ghost spam will be left out.

Where to find your hostnames

Now here comes the “tricky” part. To create this filter, you will need, to make a list of your valid hostnames.

A list of what!?

Essentially, a hostname is any place where your GA tracking code is present. You can get this information from the hostname report:

  • Go to Audience > Select Network > At the top of the table change the primary dimension to Hostname.

If your Analytics is active, you should see at least one: your domain name. If you see more, scan through them and make a list of all the ones that are valid for you.

Types of hostname you can find

The good ones:

Type

Example

Your domain and subdomains

yourdomain.com

Tools connected to your Analytics

YouTube, MailChimp

Payment gateways

Shopify, booking systems

Translation services

Google Translate

Mobile speed-up services

Google weblight

The bad ones (by bad, I mean not useful for your reports):

Type

Example/Description

Staging/development environments

staging.yourdomain.com

Internet archive sites

web.archive.org

Scraping sites that don’t bother to trim the content

The URL of the scraper

Spam

Most of the time they will show their URL, but sometimes they may use the name of a known website to try to fool you. If you see a URL that you don’t recognize, just think, “do I manage it?” If the answer is no, then it isn’t your hostname.

(not set) hostname

It usually comes from spam. On rare occasions it’s related to tracking code issues.

Below is an example of my hostname report. From the unfiltered view, of course, the master view is squeaky clean.

Now with the list of your good hostnames, make a regular expression. If you only have your domain, then that is your expression; if you have more, create an expression with all of them as we did in the fruit salad example:

Hostname REGEX (example)


yourdomain.com|hostname2|hostname3|hostname4

Important! You cannot create more than one “Include hostname filter”; if you do, you will exclude all data. So try to fit all your hostnames into one expression (you have 255 characters).

The “valid hostname filter” configuration:

  • Filter Name: Include valid hostnames
  • Filter Type: Custom > Include
  • Filter Field: Hostname
  • Filter Pattern: [hostname REGEX you created]

Campaign source filter (Crawler spam, internal sources)

Prevents traffic from:

  • Crawler spam
  • Internal third-party tools (Trello, Asana, Pingdom)

Important note: Even if these hits are shown as a referral, the field you should use in the filter is “Campaign source” — the field “Referral” won’t work.

Filter for crawler spam

The second most common type of spam is crawler. They also pretend to be a valid visit by leaving a fake source URL, but in contrast with ghost spam, these do access your site. Therefore, they leave a correct hostname.

You will need to create an expression the same way as the hostname filter, but this time, you will put together the source/URLs of the spammy traffic. The difference is that you can create multiple exclude filters.

Crawler REGEX (example)


spam1|spam2|spam3|spam4

Crawler REGEX (pre-built)


As I promised, here are latest pre-built crawler expressions that you just need to copy/paste.

The “crawler spam filter” configuration:

  • Filter Name: Exclude crawler spam 1
  • Filter Type: Custom > Exclude
  • Filter Field: Campaign source
  • Filter Pattern: [crawler REGEX]

Filter for internal third-party tools

Although you can combine your crawler spam filter with internal third-party tools, I like to have them separated, to keep them organized and more accessible for updates.

The “internal tools filter” configuration:

  • Filter Name: Exclude internal tool sources
  • Filter Pattern: [tool source REGEX]

Internal Tools REGEX (example)


trello|asana|redmine

In case, that one of the tools that you use internally also sends you traffic from real visitors, don’t filter it. Instead, use the “Exclude Internal URL Query” below.

For example, I use Trello, but since I share analytics guides on my site, some people link them from their Trello accounts.

Filters for language spam and other types of spam

The previous two filters will stop most of the spam; however, some spammers use different methods to bypass the previous solutions.

For example, they try to confuse you by showing one of your valid hostnames combined with a well-known source like Apple, Google, or Moz. Even my site has been a target (not saying that everyone knows my site; it just looks like the spammers don’t agree with my guides).

However, even if the source and host look fine, the spammer injects their message in another part of your reports like the keyword, page title, and even as a language.

In those cases, you will have to take the dimension/report where you find the spam and choose that name in the filter. It’s important to consider that the name of the report doesn’t always match the name in the filter field:

Report name

Filter field

Language

Language settings

Referral

Campaign source

Organic Keyword

Search term

Service Provider

ISP Organization

Network Domain

ISP Domain

Here are a couple of examples.

The “language spam/bot filter” configuration:

  • Filter Name: Exclude language spam
  • Filter Type: Custom > Exclude
  • Filter Field: Language settings
  • Filter Pattern: [Language REGEX]

Language Spam REGEX (Prebuilt)


\s[^\s]*\s|.{15,}|\.|,|^c$

The expression above excludes fake languages that don’t meet the required format. For example, take these weird messages appearing instead of regular languages like en-us or es-es:

Examples of language spam

The organic/keyword spam filter configuration:

  • Filter Name: Exclude organic spam
  • Filter Type: Custom > Exclude
  • Filter Field: Search term
  • Filter Pattern: [keyword REGEX]

Filters for direct bot traffic

Bot traffic is a little trickier to filter because it doesn’t leave a source like spam, but it can still be filtered with a bit of patience.

The first thing you should do is enable bot filtering. In my opinion, it should be enabled by default.

Go to the Admin section of your Analytics and click on View Settings. You will find the option “Exclude all hits from known bots and spiders” below the currency selector:

It would be wonderful if this would take care of every bot — a dream come true. However, there’s a catch: the key here is the word “known.” This option only takes care of known bots included in the “IAB known bots and spiders list.” That’s a good start, but far from enough.

There are a lot of “unknown” bots out there that are not included in that list, so you’ll have to play detective and search for patterns of direct bot traffic through different reports until you find something that can be safely filtered without risking your real user data.

To start your bot trail search, click on the Segment box at the top of any report, and select the “Direct traffic” segment.

Then navigate through different reports to see if you find anything suspicious.

Some reports to start with:

  • Service provider
  • Browser version
  • Network domain
  • Screen resolution
  • Flash version
  • Country/City

Signs of bot traffic

Although bots are hard to detect, there are some signals you can follow:

  • An unnatural increase of direct traffic
  • Old versions (browsers, OS, Flash)
  • They visit the home page only (usually represented by a slash “/” in GA)
  • Extreme metrics:
    • Bounce rate close to 100%,
    • Session time close to 0 seconds,
    • 1 page per session,
    • 100% new users.

Important! If you find traffic that checks off many of these signals, it is likely bot traffic. However, not all entries with these characteristics are bots, and not all bots match these patterns, so be cautious.

Perhaps the most useful report that has helped me identify bot traffic is the “Service Provider” report. Large corporations frequently use their own Internet service provider name.

I also have a pre-built expression for ISP bots, similar to the crawler expressions.

The bot ISP filter configuration:

  • Filter Name: Exclude bots by ISP
  • Filter Type: Custom > Exclude
  • Filter Field: ISP organization
  • Filter Pattern: [ISP provider REGEX]

ISP provider bots REGEX (prebuilt)


hubspot|^google\sllc$ |^google\sinc\.$ |alibaba\.com\sllc|ovh\shosting\sinc\.

Latest ISP bot expression

IP filter for internal traffic

We already covered different types of internal traffic, the one from test sites (with the hostname filter), and the one from third-party tools (with the campaign source filter).

Now it’s time to look at the most common and damaging of all: the traffic generated directly by you or any member of your team while working on any task for the site.

To deal with this, the standard solution is to create a filter that excludes the public IP (not private) of all locations used to work on the site.

Examples of places/people that should be filtered

  • Office
  • Support
  • Home
  • Developers
  • Hotel
  • Coffee shop
  • Bar
  • Mall
  • Any place that is regularly used to work on your site

To find the public IP of the location you are working at, simply search for “my IP” in Google. You will see one of these versions:

IP version

Example

Short IPv4

1.23.45.678

Long IPv6

2001:0db8:85a3:0000:0000:8a2e:0370:7334

No matter which version you see, make a list with the IP of each place and put them together with a REGEX, the same way we did with other filters.

  • IP address expression: IP1|IP2|IP3|IP4 and so on.

The static IP filter configuration:

  • Filter Name: Exclude internal traffic (IP)
  • Filter Type: Custom > Exclude
  • Filter Field: IP Address
  • Filter Pattern: [The IP expression]

Cases when this filter won’t be optimal:

There are some cases in which the IP filter won’t be as efficient as it used to be:

  • You use IP anonymization (required by the GDPR regulation). When you anonymize the IP in GA, the last part of the IP is changed to 0. This means that if you have 1.23.45.678, GA will pass it as 1.23.45.0, so you need to put it like that in your filter. The problem is that you might be excluding other IPs that are not yours.
  • Your Internet provider changes your IP frequently (Dynamic IP). This has become a common issue lately, especially if you have the long version (IPv6).
  • Your team works from multiple locations. The way of working is changing — now, not all companies operate from a central office. It’s often the case that some will work from home, others from the train, in a coffee shop, etc. You can still filter those places; however, maintaining the list of IPs to exclude can be a nightmare,
  • You or your team travel frequently. Similar to the previous scenario, if you or your team travels constantly, there’s no way you can keep up with the IP filters.

If you check one or more of these scenarios, then this filter is not optimal for you; I recommend you to try the “Advanced internal URL query filter” below.

URL query filter for internal traffic

If there are dozens or hundreds of employees in the company, it’s extremely difficult to exclude them when they’re traveling, accessing the site from their personal locations, or mobile networks.

Here’s where the URL query comes to the rescue. To use this filter you just need to add a query parameter. I add “?internal” to any link your team uses to access your site:

  • Internal newsletters
  • Management tools (Trello, Redmine)
  • Emails to colleagues
  • Also works by directly adding it in the browser address bar

Basic internal URL query filter

The basic version of this solution is to create a filter to exclude any URL that contains the query “?internal”.

  • Filter Name: Exclude Internal Traffic (URL Query)
  • Filter Type: Custom > Exclude
  • Filter Field: Request URI
  • Filter Pattern: \?internal

This solution is perfect for instances were the user will most likely stay on the landing page, for example, when sending a newsletter to all employees to check a new post.

If the user will likely visit more than the landing page, then the subsequent pages will be recorded.

Advanced internal URL query filter

This solution is the champion of all internal traffic filters!

It’s a more comprehensive version of the previous solution and works by filtering internal traffic dynamically using Google Tag Manager, a GA custom dimension, and cookies.

Although this solution is a bit more complicated to set up, once it’s in place:

  • It doesn’t need maintenance
  • Any team member can use it, no need to explain techy stuff
  • Can be used from any location
  • Can be used from any device, and any browser

To activate the filter, you just have to add the text “?internal” to any URL of the website.

That will insert a small cookie in the browser that will tell GA not to record the visits from that browser.

And the best of it is that the cookie will stay there for a year (unless it is manually removed), so the user doesn’t have to add “?internal” every time.

Bonus filter: Include only internal traffic

In some occasions, it’s interesting to know the traffic generated internally by employees — maybe because you want to measure the success of an internal campaign or just because you’re a curious person.

In that case, you should create an additional view, call it “Internal Traffic Only,” and use one of the internal filters above. Just one! Because if you have multiple include filters, the hit will need to match all of them to be counted.

If you configured the “Advanced internal URL query” filter, use that one. If not, choose one of the others.

The configuration is exactly the same — you only need to change “Exclude” for “Include.”

Cleaning historical data

The filters will prevent future hits from junk traffic.

But what about past affected data?

I know I told you that deleting aggregated historical data is not possible in GA. However, there’s still a way to temporarily clean up at least some of the nasty traffic that has already polluted your reports.

For this, we’ll use an advanced segment (a subset of your Analytics data). There are built-in segments like “Organic” or “Mobile,” but you can also build one using your own set of rules.

To clean our historical data, we will build a segment using all the expressions from the filters above as conditions (except the ones from the IP filter, because IPs are not stored in GA; hence, they can’t be segmented).

To help you get started, you can import this segment template.

You just need to follow the instructions on that page and replace the placeholders. Here is how it looks:

In the actual template, all text is black; the colors are just to help you visualize the conditions.

After importing it, to select the segment:

  1. Click on the box that says “All users” at the top of any of your reports
  2. From your list of segments, check the one that says “0. All Users – Clean”
  3. Lastly, uncheck the “All Users”

Now you can navigate through your reaports and all the junk traffic included in the segment will be removed.

A few things to consider when using this segment:

  • Segments have to be selected each time. A way of having it selected by default is by adding a bookmark when the segment is selected.
  • You can remove or add conditions if you need to.
  • You can edit the segment at any time to update it or add conditions (open the list of segments, then click “Actions” then “Edit”).

  • The hostname expression and third-party tools expression are different for each site.
  • If your site has a large volume of traffic, segments may sample your data when selected, so if you see the little shield icon at the top of your reports go yellow (normally is green), try choosing a shorter period (i.e. 1 year, 6 months, one month).

Conclusion: Which cake would you eat?

Having real and accurate data is essential for your Google Analytics to report as you would expect.

But if you haven’t filtered it properly, it’s almost certain that it will be filled with all sorts of junk and artificial information.

And the worst part is that if don’t realize that your reports contain bogus data, you will likely make wrong or poor decisions when deciding on the next steps for your site or business.

The filters I share above will help you prevent the three most harmful threats that are polluting your Google Analytics and don’t let you get a clear view of the actual performance of your site: spam, bots, and internal traffic.

Once these filters are in place, you can rest assured that your efforts (and money!) won’t be wasted on analyzing deceptive Google Analytics data, and your decisions will be based on solid information.

And the benefits don’t stop there. If you’re using other tools that import data from GA, for example, WordPress plugins like GADWP, excel add-ins like AnalyticsEdge, or SEO suites like Moz Pro, the benefits will trickle down to all of them as well.

Besides highlighting the importance of the filters in GA (which I hope I made clear by now), I would also love for the preparation of these filters to inspire the curiosity and basis to create others that will allow you to do all sorts of remarkable things with your data.

Remember, filters not only allow you to keep away junk, you can also use them to rearrange your real user information — but more on that on another occasion.


That’s it! I hope these tips help you make more sense of your data and make accurate decisions.

Have any questions, feedback, experiences? Let me know in the comments, or reach me on Twitter @carlosesal.

Complementary resources:

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Facebook is Now Teaching Bots How to Negotiate…and Lie

Facebook has built a massive pool of bots that reside in their messenger app. These bots were trained to answer basic questions and perform basic commands based on text to dialogue recognition. While the bots held a lot of promise, it was obvious that they still had a lot to learn before they can fluently converse with humans.

After recognizing the flaws with the initial roll out, the company reinstated the menu option and went back to the drawing board. Facebook then played around with a combination of native language learning and machine learning that are more commonly used in the gaming industry. As a result, they were able to develop an AI that can negotiate on behalf of and with humans.

How did Facebook achieve such a feat? They built two bots and presented them with several objects: two books, one hat, and three balls. Each bot was programmed with a hidden preference, and their goal was to compromise with each other until they reach a point where they can both walk away with what they want.

This experiment was conducted by the Facebook Artificial Intelligence Research (FAIR) group in collaboration with the Georgia Institute of Technology. The group now claims that they have the code that will teach bots how to negotiate.

While this technology holds a lot of promise, there’s no denying that there is still a lot of room for improvements since the experiment is only at its initial stages. According to some experts, while the code did teach bots how to negotiate, it also taught them to lie (by putting false emphasis on an object) in order to achieve their goal. If a business owner decides to integrate these cunning bots into their current business model, they are bound to encounter some problems.

It also showed signs of excessive willingness to concede just to achieve a decent gain, which may result in bad business decisions.

Despite its flaws, the bots built for this exercise showcased extensive conversational skills that were far superior to any bots in operation today. They showed a capability to construct complex sentences and form a deep understanding of the messages being delivered to them.

The researchers at FAIR said that they will continue to improve the bot’s ability to form more competitive reasoning strategies while further broadening their understanding of the native language. This means that we are bound to see more eloquent bots capable of negotiating deals in the future.

The post Facebook is Now Teaching Bots How to Negotiate…and Lie appeared first on WebProNews.


WebProNews

Posted in IM NewsComments Off

How to Stop Spam Bots from Ruining Your Analytics Referral Data

Posted by jaredgardner

A few months back, my agency started seeing a referral traffic spike in our Google Analytics account. At first, I got excited. Someone is linking to us and people are clicking. Hooray!

Wrong! How very, very wrong. As I dug deeper, I saw that most of this referral traffic was sent from spammers, and mostly from one spammer named Vitaly Popov (or, as I like to call him, “the most recent pain in my ass”). 

The domains he owns have been giving our company’s site and most of our clients’ sites a few hundred sessions per month, enough to throw off the analytics data in many cases.

His sites aren’t the only ones I’ll cover in this how-to, but his spam network has been the biggest nuisance lately. If you’re getting spam referrers in your analytics, you should be able to follow the same steps to stop these data-skewing nimcompoops from spoiling your data, too.

Why do I need to worry about blocking and filtering these sites?

There are two main reasons I’m motivated to block these on all sites that I work with. First: corrupt analytics data. A few hundred hits a month on a site like
Moz.com isn’t going to move the needle when compared to the sheer volume of sessions they have daily. However, on a small site for a local plumber, 30 sessions per day is likely going to be 70% spam referral traffic, suffocating the remaining legitimate traffic and making marketing analysis a frustrating endeavor.

Second: server load and security. I didn’t ask them to crawl or visit my site. Their visits are using my server resources for something that I don’t want or need. An overloaded server means slower load times, which translate to higher bounce rates and lower rankings. On top of that, who knows what else they’re doing on my site while they’re there. They could easily be looking for WordPress, plugin and server vulnerabilities.

Popular referral spam domains

Using 
WHOIS.net, I found that Mr. Popov’s spam network includes these domains:

  • darodar.com (and various subdomains)
  • econom.co
  • ilovevitaly.co (and other TLD variations)

Other spammers plaguing the web include:

  • semalt.com (and various subdomains)
  • buttons-for-website.com
  • see-your-website-here.com

Many other sites have come and gone. These are just the sites that have been active lately.

Why are they hitting my site?

Why are people going through so much effort to crawl the web without blocking themselves from analytics? Spam! So much spam, it still blows me away. I looked into a few of the sites listed above. Three of the most prolific ones are doing it for very different reasons. 

See-your-website-here.com

Screen-Shot-2015-01-21-at-2.30.22-PM.png

This site takes the cake for being the most frustrating. This site is using referrer spam as a form of lead generation. What is their product you ask? Web spam. You can pay see-your-website-here.com to perform web spam for your company as a form of lead generation. The owner of this domain was kind enough to make his WHOIS information public. His name is Ben Sykes and he’s from London.

Semalt.com

Screen-Shot-2015-01-21-at-2.44.09-PM.png

Semalt.com and I have had a tumultuous relationship at best. Semalt is an SEO product that’s designed to give on- and off-page analysis such as keyword usage and link metrics. Their products seem to be somewhat legit. However, their business practices are not. Semalt uses a bot to crawl the web and index webpage data, but they don’t disable analytics tracking like most respectable bots do. They have a form to remove your site from being crawled at
http://semalt.com/project_crawler.php, which is ever so nice of them. Of course, I tried this months ago and they still crawled our site. I ended up talking with a representative from Semalt.com via Twitter after I wrote this article: How to Stop Semalt.com from Plaguing Your Google Analytics Data. I’ve documented our interactions and the outcome of that project in the article. 

Darodar.com, econom.co, and ilovevitaly.com

Screen-Shot-2015-01-21-at-4.03.48-PM.png

This network appears to exist for the purpose of directing affiliate traffic to shopping sites such as AliExpress.com and eBay.com. I am guessing that the site won’t pay out to the affiliate unless the traffic results in a purchase, which seems unlikely. The sub-domain shopping.ilovevitaly.com used to redirect to aliexpress.com directly, but now it goes to a landing page that links to a variety of online retailers.

How to stop spam bots

Block via .htaccess

The best way to block referrers from accessing your site at all is to block them in your .htaccess file in the root directory of your domain. You can copy and paste the following code into your .htaccess file, assuming you’re on an Apache server. I like this method better than just blocking the domain in analytics because it prevents spam bots from hitting your server altogether. If you want to get creative, you can redirect the traffic back to their site.

# Block Russian Referrer Spam
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*ilovevitaly\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*ilovevitaly.\.ru/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*ilovevitaly\.org/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*ilovevitaly\.info/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*iloveitaly\.ru/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*econom\.co/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*savetubevideo\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*kambasoft\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*buttons\-for\-website\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*semalt\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*darodar\.com/ [NC]
RewriteRule ^(.*)$   – [F,L]

Warning: .htaccess is a very powerful file that dictates how your server behaves. If you upload an .htaccess file with one character out of place, you will likely take down the whole site. Before you make any changes to the file, I would suggest making a backup. If you don’t feel comfortable making these edits, see the WordPress plug-in option below.

Analytics filters

By itself, .htaccess won’t solve all of your problems. It will only protect you from future sessions, and it won’t affect the sessions that have already happened. I like to set up filters by country in analytics to remove the historical data, as well as to help filter out any other bots we might find from select countries in the future. Of course this wouldn’t be a good idea if you expect to get legitimate traffic from countries like Russia, Brazil, or Indonesia, but many U.S.-based companies can safely block these countries without losing potential customers. Follow the steps below to set up the filters.

First, click on the “Admin” tab at the top of the page. On the view column you will want to create a “new” view so that you still have an unadulterated report of all traffic in Google Analytics. I named my mine “Filter Bots.” After you have your new view selected, click in to the “Filters” section then select the “+New Filter Button.”

View_filter_fianl.png

Setting up filters is pretty simple if you know what setting to use. I like to filter out all traffic from Russia, Brazil, and Indonesia. These are just the countries that have been giving us issues lately. You can add more filters as you need them.

The filter name is just an arbitrary label. I usually just type “block .” Next, choose the filter type “custom.” Choose “country” from the “Filter Field” drop down. The “Filter Pattern Field” is where you actually define what countries you are filtering, so make sure you spell them correctly. You can double check your filters by using the “Verify This Filter” button. A graph will pop-up and show you how many sessions will be removed from the last seven days.

Filter_settings_final.jpg

I would recommend selecting the “Bot Filtering” check box that is found in “View Settings” within the “Admin” tab. I haven’t seen a change in my data using this feature yet, but it doesn’t hurt to set it up since it’s really easy and maybe Google will decide to block some of these spammers.

Viewsettings_bot_button_final.jpg

Using WordPress? Don’t want to edit your .htaccess file?

I’ve used the plugin
Wp-Ban before, and it makes it easy to block unwanted visitors. Wp-ban gives you the ability to ban users by IP, IP range, host name, user agent and referrer URL from visiting your WordPress blog all from within the WordPress admin panel. This a great option for people who don’t want to edit their .htaccess file or don’t feel comfortable doing so.

Additional resources

There are a few other great posts you can refer to if you’re looking for more info on dealing with referrer spam:

  1. http://www.optimizesmart.com/geek-guide-removing-referrer-spam-google-analytics/
  2. https://megalytic.com/blog/how-to-filter-out-fake-referrals-and-other-google-analytics-spam
  3. http://blog.raventools.com/stop-referrer-spam/
  4. http://www.analyticsedge.com/2014/12/removing-referral-spam-google-analytics/

Conclusion

I hope this helps you block all the pesky spammers out there. There are definitely different ways you can solve this problem, and these are just the ones that have helped me protect analytics data. I’d love to hear how you have dealt with spam bots. Share your stories with me on Twitter or in the comments below.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off


Advert