Tag Archive | "Data"

50 Million Google+ Accounts Compromised in Latest Data Breach, Platform to Shut Down Earlier Than Planned

The discovery of another privacy flaw has pushed Google to shut down Google+ much earlier than expected.

Google announced on December 10 that it had discovered a security issue that potentially left more than 50 million accounts vulnerable in November. The revelation came shortly on the heels of a previous admission that a security lapse in March also affected thousands of users. Because of this, the company says Google+ will be shut down by April 2019.

Google initially planned to sunset the platform by August 2019. The company made the announcement to close its Google+ network in October, after it admitted that an earlier breach affected 500,000 users.

The latest bug was said to have been inadvertently created by a software patch that Google developed last month. It reportedly gave third-party apps access to account users’ profile data and exposed even information that wasn’t made public. It took the company six days to notice it and find a solution.

In a blog post, Google’s Vice President of Product Management, David Thacker, shared that “No third party compromised our systems, and we have no evidence that the app developers that inadvertently had this access for six days were aware of it or misused it in any way.”

However, the bug made it possible for apps where users willingly shared their Google+ data to also access their friends’ profile or those of people who shared data with them. Google gave assurances though that it did not expose any passwords, financial data, or other sensitive details that could be used in identity theft.

The Alphabet-owned company had also suffered a security breach in March. That particular bug placed tens of thousands of users’ personal information at risk. The company waited half a year before it admitted to regulators and the public that there was a problem. The breach happened around the time Facebook was embroiled in the Cambridge Analytica controversy. Reports stated that Google delayed revealing the problem partly to avoid regulators from scrutinizing the company.

The admission that there was another security issue couldn’t have come at a worse time for the company. Google’s CEO, Sundar Pichai, was set to appear before the House Judiciary Committee on December 11 to be grilled about the company’s data practices.

Google+ will be shutting down all its APIs for developers within three months. However, the platform’s enterprise version will remain functional. Google also acknowledged on Monday that Google+ had a low number of users and that there were major obstacles to turning it into a successful product.

[Featured image via Google]

The post 50 Million Google+ Accounts Compromised in Latest Data Breach, Platform to Shut Down Earlier Than Planned appeared first on WebProNews.


WebProNews

Posted in IM NewsComments Off

AdStage’s new Join automatically shows Google Analytics, Salesforce data for search, social campaigns

Customers will have full-funnel visibility into how their search and social campaigns are driving sales outcomes, without URL tagging.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

AWS CEO Announces Textract to Extract Data Without Machine Learning Skills

AWS CEO Andy Jassy announced Amazon Textract at the AWS re:Invent 2018 conference. Textract allows AWS customers to automatically extract formatted data from documents without losing the structure of the data. Best of all, there are no machine learning skills required to use Textract. It’s something that many data-intensive enterprises have been requesting for many years.

Amazon Launches Textract to Easily Extract Usable Data

Our customers are frustrated that they can’t get more of all those text and data that are in documents into the cloud, so they can actually do machine learning on top of it. So we worked with our customers, we thought about what might solve these problems and I’m excited to announce the launch of Amazon Textract. This is an OCR plus plus service to easily extract text and data from virtually any document and there is no machine learning experience required.

This is important, you don’t need to have any machine learning experience to be able to use Textract. Here’s how it generally works. Below is a pretty typical document, it’s got a couple of columns and it’s got a table in the middle of the left column.

When you use OCR it just basically captures all that information in a row and so what you end up with is the gobbledygook you see in the box below which is completely useless. That’s typically what happens.

Let’s go through what Textract does. Textract is intelligent. Textract is able to tell that there are two columns here so actually when you get the data and the language it reads like it’s supposed to be read. Textract is able to identify that there’s a table there and is able to lay out for you what that table should look like so you can actually read and use that data in whatever you’re trying to do on the analytics and machine learning side. That’s a very different equation.

Textract Works Great with Forms

What happens with most of these forms is that the OCR can’t really read the forms or actually make them coherent at all. Sometimes these templates will kind of effectively memorize in this box is this piece of data. Textract is going to work across legal forms and financial forms and tax forms and healthcare forms, and we will keep adding more and more of these.

But also these forms will change every few years and when they do something that you thought was a Social Security number in this box turns out now not to be a date of birth. What we have built Textract to do is to recognize what certain data items or objects are so it’s able to tell this set of characters is a Social Security number, this set of characters is a date of birth, this set of characters is an address.

Not only can we apply it to many more forms but also if those forms change Textract doesn’t miss a beat. That is a pretty significant change in your capability in being able to extract and digitally use data that are in documents.

The post AWS CEO Announces Textract to Extract Data Without Machine Learning Skills appeared first on WebProNews.


WebProNews

Posted in IM NewsComments Off

My Five Greatest Mistakes as A Leader: 30 years of painful data (that might help you)

For the leader, sometimes the most important data is derived from a source that evades our metrics platforms. Indeed, such data can only be gleaned through brutal self-confrontation.
MarketingSherpa Blog

Posted in IM NewsComments Off

How to Create a Local Marketing Results Dashboard in Google Data Studio – Whiteboard Friday

Posted by DiTomaso

Showing clients that you’re making them money is one of the most important things you can communicate to them, but it’s tough to know how to present your results in a way they can easily understand. That’s where Google Data Studio comes in. In this week’s edition of Whiteboard Friday, our friend Dana DiTomaso shares how to create a client-friendly local marketing results dashboard in Google Data Studio from start to finish.

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Hi, Moz fans. My name is Dana DiTomaso. I’m President and partner of Kick Point. We’re a digital marketing agency way up in the frozen north of Edmonton, Alberta. We work with a lot of local businesses, both in Edmonton and around the world, and small local businesses usually have the same questions when it comes to reporting.

Are we making money?

What I’m going to share with you today is our local marketing dashboard that we share with clients. We build this in Google Data Studio because we love Google Data Studio. If you haven’t watched my Whiteboard Friday yet on how to do formulas in Google Data Studio, I recommend you hit Pause right now, go back and watch that, and then come back to this because I am going to talk about what happened there a little bit in this video.

The Google Data Studio dashboard

This is a Google Data Studio dashboard which I’ve tried to represent in the medium of whiteboard as best as I could. Picture it being a little bit better design than my left-handedness can represent on a whiteboard, but you get the idea. Every local business wants to know, “Are we making money?” This is the big thing that people care about, and really every business cares about making money. Even charities, for example: money is important obviously because that’s what keeps the lights on, but there’s also perhaps a mission that they have.

But they still want to know: Are people filling out our donation form? Are people contacting us? These are important things for every business, organization, not-for-profit, whatever to understand and know. What we’ve tried to do in this dashboard is really boil it down to the absolute basics, one thing you can look at, see a couple of data points, know whether things are good or things are bad.

Are people contacting you?

Let’s start with this up here. The first thing is: Are people contacting you? Now you can break this out into separate columns. You can do phone calls and emails for example. Some of our clients prefer that. Some clients just want one mashed up number. So we’ll take the number of calls that people are getting.

If you’re using a call tracking tool, such as CallRail, you can import this in here. Emails, for example, or forms, just add it all together and then you have one single number of the number of times people contacted you. Usually this is a way bigger number than people think it is, which is also kind of cool.

Are people taking the action you want them to take?

The next thing is: Are people doing the thing that you want them to do? This is really going to decide on what’s meaningful to the client.

For example, if you have a client, again thinking about a charity, how many people filled out your donation form, your online donation form? For a psychologist client of ours, how many people booked an appointment? For a client of ours who offers property management, how many people booked a viewing of a property? What is the thing you want them to do? If they have online e-commerce, for example, then maybe this is how many sales did you have.

Maybe this will be two different things — people walking into the store versus sales. We’ve also represented in this field if a person has a people counter in their store, then we would pull that people counter data into here. Usually we can get the people counter data in a Google sheet and then we can pull it into Data Studio. It’s not the prettiest thing in the world, but it certainly represents all their data in one place, which is really the whole point of why we do these dashboards.

Where did visitors com from, and where are your customers coming from?

People contacting you, people doing the thing you want them to do, those are the two major metrics. Then we do have a little bit deeper further down. On this side here we start with: Where did visitors come from, and where are your customers coming from? Because they’re really two different things, right? Not every visitor to the website is going to become a customer. We all know that. No one has a 100% conversion rate, and if you do, you should just retire.

Filling out the dashboard

We really need to differentiate between the two. In this case we’re looking at channel, and there probably is a better word for channel. We’re always trying to think about, “What would clients call this?” But I feel like clients are kind of aware of the word “channel” and that’s how they’re getting there. But then the next column, by default this would be called users or sessions. Both of those are kind of cruddy. You can rename fields in Data Studio, and we can call this the number of people, for example, because that’s what it is.

Then you would use the users as the metric, and you would just call it number of people instead of users, because personally I hate the word “users.” It really boils down the humanity of a person to a user metric. Users are terrible. Call them people or visitors at least. Then unfortunately, in Data Studio, when you do a comparison field, you cannot rename and call it comparison. It does this nice percentage delta, which I hate.

It’s just like a programmer clearly came up with this. But for now, we have to deal with it. Although by the time this video comes out, maybe it will be something better, and then I can go back and correct myself in the comments. But for now it’s percentage delta. Then goal percentage and then again delta. They can sort by any of these columns in Data Studio, and it’s real live data.

Put a time period on this, and people can pick whatever time period they want and then they can look at this data as much as they want, which is delightful. If you’re not delivering great results, it may be a little terrifying for you, but really you shouldn’t be hiding that anyway, right? Like if things aren’t going well, be honest about it. That’s another talk for another time. But start with this kind of chart. Then on the other side, are you showing up on Google Maps?

We use the Supermetrics Google My Business plug-in to grab this kind of information. We hook it into the customer’s Google Maps account. Then we’re looking at branded searches and unbranded searches and how many times they came up in the map pack. Usually we’ll have a little explanation here. This is how many times you came up in the map pack and search results as well as Google Maps searches, because it’s all mashed in together.

Then what happens when they find you? So number of direction requests, number of website visits, number of phone calls. Now the tricky thing is phone calls here may be captured in phone calls here. You may not want to add these two pieces of data or just keep this off on its own separately, depending upon how your setup is. You could be using a tracking number, for example, in your Google My Business listing and that therefore would be captured up here.

Really just try to be honest about where that data comes from instead of double counting. You don’t want to have that happen. The last thing is if a client has messages set up, then you can pull that message information as well.

Tell your clients what to do

Then at the very bottom of the report we have a couple of columns, and usually this is a longer chart and this is shorter, so we have room down here to do this. Obviously, my drawing skills are not as good as as aligning things in Data Studio, so forgive me.

But we tell them what to do. Usually when we work with local clients, they can’t necessarily afford a monthly retainer to do stuff for clients forever. Instead, we tell them, “Here’s what you have to do this month.Here’s what you have to do next month. Hey, did you remember you’re supposed to be blogging?” That sort of thing. Just put it in here, because clients are looking at results, but they often forget the things that may get them those results. This is a really nice reminder of if you’re not happy with these numbers, maybe you should do these things.

Tell your clients how to use the report

Then the next thing is how to use. This is a good reference because if they only open it say once every couple months, they probably have forgotten how to do the stuff in this report or even things like up at the top make sure to set the time period for example. This is a good reminder of how to do that as well.

Because the report is totally editable by you at any time, you can always go in and change stuff later, and because the client can view the report at any time, they have a dashboard that is extremely useful to them and they don’t need to bug you every single time they want to see a report. It saves you time and money. It saves them time and money. Everybody is happy. Everybody is saving money. I really recommend setting up a really simple dashboard like this for your clients, and I bet you they’ll be impressed.

Thanks so much.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

New report from MarTech Today: Enterprise Customer Data Platforms: A Marketer’s Guide

Learn everything you need to know about enterprise customer data platforms.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in IM NewsComments Off

Google Collaborates With Newsgroups to Show Tabular Data in Search Results

Google just announced that it will be using a new type of formatting for how newsgroups present their data in Search results. This will make it easier for users to find the information they need. The move to make data more accessible is part of the Google News Initiative. The project allows the company to work together with the journalism and news industry to build a stronger and more dynamic future for news.

So far, Google has collaborated with 30 expert data journalists to develop ways to better present key data. The partnership has resulted in a format well-suited for search results. Users will now see structured data in a tabular form displayed directly in the top search result. The format makes it more readable and easier for users to locate the information they need. It has also made it simpler to add structured data to a website’s existing code.

Google has used indexed data in Search before, but it’s only now that news organizations are included.

 

ProPublica has already utilized this improved method of showing search results. The news organization’s Deputy Managing Editor, Scott Klein, commented on the enhanced presentation and said the real-world impact news orgs have make it vital that people are given the information they need when they need it.

“If we can make the data we’ve worked hard to collect and prepare available to people at the very moment when they’re researching a big life decision, and thereby help them make the best decision they can, it’s an absolute no-brainer for us,” Klein said. He also added that adding extra code is trivial at best.

Google has been on a mission to improve Chrome, with reports of a makeover or an improved image search function floating around for months now. But changing the format for how data is presented in search results so that it’s comprehensive and easier to understand is the right step to take.

[Featured image via Pixabay]

The post Google Collaborates With Newsgroups to Show Tabular Data in Search Results appeared first on WebProNews.


WebProNews

Posted in IM NewsComments Off

Calculated Fields in Google Data Studio – Whiteboard Friday

Posted by DiTomaso

Google Data Studio is a powerful tool to have in your SEO kit. Knowing how to get the most out of its power begins with understanding how to use calculated fields to apply good old-fashioned math to your data. In this week’s Whiteboard Friday, we’re delighted to welcome guest host Dana DiTomaso as she takes us through how to use calculated fields in Google Data Studio to uncover more value in your data and improve your reports.

Calculated Fields in Google Data Studio

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Hi, Moz fans. I’m Dana DiTomaso, President and partner at Kick Point, and we love Google Data Studio at Kick Point. You may not love Google Data Studio yet, but after you watch this I think you probably will.

One of the first things that you think about Google Data Studio is: Why would I use this? It’s just charts. It’s the same thing I can get in Analytics or a billion other dashboarding tools out there. But one of the things that I really like about Google Data Studio is math. You can do lots of different stuff in Data Studio, and I’m going to go through four of the basic types in Data Studio and then how you can use that to improve your reports, just as you sort of dip your toes into the Google Data Studio pool. What I’ve done here is I have written out a lot of the formulas that you’re going to be using.

The types

It’s a lot of obviously written out formulas, but when you get into Data Studio, you should be able to type these in and they’ll work. Let’s start at the beginning with the types.

  1. Basic math. This is pretty obvious. 1 + 1 = 2. Phone calls plus emails equals this, for example. You can add together different fields.
  2. Transforms. Let’s say people are really bad at writing some things upper case and some things lower case. You have a problem with URLs being written a couple of different ways. You can use a transform to transform upper case into lower case. That’s pretty nice.
  3. Formulas. Formulas is where you’re saying only show this subset of the data. Or how often does this happen? That could be things like the Count function, so count how many times this occurs, for example, and present that as a totally separate metric, which can be really useful for things like when you want to count the number of times an event occurs and then compare that against something else. It can just pull out that kind of data.
  4. Logic. This is the more complex one. If X, then Y. If this happens, then that’s going to happen. There’s a lot of really complex stuff in there. But if you’re just getting started, start with this, and then look at the Google Data Studio documentation. You’ll find some cooler stuff in there.

1. Basic math

Here are some examples of how we use this in our Google Data Studio dashboards. So basic math, one of the things that a lot of people care about is: Are people getting in touch with me?

This is the basics of the reason why we do marketing. Are people getting in touch? So, for example, you can do some basic math and say, “All right. So I know on our website in Google Tag Manager, we have a trigger that fires whenever somebody taps or clicks a MailTo link on the site.” In addition to that, we’re tracking how many people submit a form, as you should.

Instead of reporting these separately, really they’re kind of the same thing. They’re emailing one way or the other. Why don’t we just submit them as one metric? So in that case, you can say grab all the mail to form completions and then grab all the form goal completions, and now you have a total email requests or total requests or whatever you might want to call it. You can do the same thing where it’s like, well, phone calls and emails, does it really matter if they’re in separate buckets?

Just put them all in one. The same thing with the basic math. Just add it all together and then you’ve got one total metric you can present to the client. Here’s how much money we made for you. Boom. That’s a nice one. The next thing — I’m just going to flip over here — is formulas.

2. Formulas

Okay, so formulas, one of the things that I really like doing is looking at your Google Search Console data. This is in Data Studio. You’re going to use Search Console for this, which is a nice data source. We all know Search Console data is not necessarily 100% accurate, but there’s always lots of keyword treasure in there to be found if it’s easy to find, which the Search Console interface isn’t super great.

So you can make a report in Data Studio and say regex match, and so don’t be afraid of regex. I think everyone should learn it. But if you’re not super familiar with it, this is a really easy way to do it. Say, okay, every time a keyword contains why, how, can, what, for example, then those are question searches. You may change it to whatever makes sense for you.

But this is just pulling out that subset of data. Then you can see, so if these are question searches, do we have content that answers that question? No. Maybe this is something we need to think about. Or we’re getting impressions for this. You could filter it and say only show questions searches where our average rank is below 20. Maybe if we improve this content, this is a featured snippet opportunity for us, for example. That’s a real gold mine of data you can play around with.

3. Transforms

The third one is transforms. As I mentioned earlier, this is a really nice way to take Facebook, for example. We had a client who had Facebook in all upper case and Facebook in title case and Facebook in lower case in their sources and mediums, because they were very casual with how they used their UTM codes. We just standardized them all to go to lower, and those are nice text transforms that you can do.

It just makes things look a little bit nicer. I do recommend doing some of this, especially if you have messy data.

4. Logic

Then the big one here. This is logic, and I’m just going to toss over here for a second. Now logic has a lot of different components. What I’m showing you right now is a case when else end transform or logic. We use this to tidy up bad channel data.

So that client that I mentioned, who was just super casual with their UTM tags and they would just put in any old stuff, I think they had retargeting ads as a medium. You can set up channels and whatnot in Google Analytics. But I mean, really, when it comes down to it, not everybody is great at following the rules for UTMs that you’ve set up. Stuff happens.

It’s okay. You can fix it in Data Studio. Especially if you open up Google Analytics and you see that you have this other channel, which I’m sure when we’ve inherited an Analytics account, we take a look at it, and there’s this channel, and it’s just a big bag of crap.

You can go in there and turn that into real, useful, actual channel data that matches up with where it should go. What I’ve got here is a really simple example. This could go on for lines and line and lines. I’ve just included two lines because this whiteboard is only so big.

So you start off by saying case. It is the case when, is the idea when, and then the first line here is source equals direct and medium equals not set or medium none, then direct. So I’m saying, okay, so this is the basics of how direct traffic happens.

If the source is direct and the medium is not set or the medium is none, like if I have no data whatsoever, now it’s direct traffic. Great, that’s basically what Google Analytics does. Nothing fancy is going on here. Now here’s the next thing. In this case, I’m saying now I’m combining a regex match, which we talked about up here, with the case, and so now what I’m saying is when regex match medium, and then I’ve got this here.

Don’t be scared of this. I know it’s regex and maybe you’re not super comfortable with it, but this is pretty elementary stuff, and once you do this, you will feel like a data wizard, I guarantee. The first time I did this I stood up from my computer and said “Yes” the first time it worked. Just play with it. It’s going to be awesome. So you’ve got a little … what’s the thing called? You’ve got a little up arrow thingy there, very bad mediums dollar sign.

What this is saying is that if you’ve got anything in there that’s sort of a weird medium, just write out all the crud that people have put in there over the years, all the weird mediums that totally don’t make any sense at all. Just put it all in there and then you can toss it in a bucket say called paid social. You can do the same thing with referral traffic. Or, for example, this is really useful if a client is saying, “Well, I want to know how this set of affiliate traffic compares to say this set of affiliate traffic,” then you can separate these out into different buckets.

This isn’t just for channel data. I’ve done this, for example, where we were looking at social data and we were comparing NFL teams as an example for another tool, Rival IQ. What I said was, okay, so these teams here are in the AFC East, and these teams are in the AFC West. If I’ve screwed up and I said AFC East and West, please don’t get mad at me in the comments. I promise I play fantasy football. I just don’t remember right now.

But you can combine different areas. This is great for things like sales regions, for example. So North America equals Canada plus the USA plus Mexico, if you’re feeling generous. This is NAFTA politics. It really depends on what you want to do with those sales regions and how your data, what is meaningful for you. That’s the most important thing about this is that you can change this data to be whatever you need it to be to make that reporting so much easier for you.

I mean, Else then, we don’t know if this might actually output. I haven’t tried this myself. If it does, please leave a comment and let me know.

Then you end up with an End. When you’re in Data Studio, when you’re making these calculated formulas, you’ll see right away whether or not it works or not. Just keep trying until you see it happen.

One of the great things about Data Studio is that if it’s right, you’ll see these types of colors, and I’ve used different color whiteboard markers to indicate how it should look. If you see red where you should be seeing black or green where you should be seeing black, for example, then you know you’ve typed in something wrong in your formula. For me, typically I find it’s a misplaced bracket. Just keep an eye on that.

Have fun with Data Studio. One of the great things too is that you can’t mess up your original data when doing calculated fields, so you can go hog wild and it’s not going to mess with the original data. I hope you have a great time in Data Studio. Tell me what you’ve done in the comments, please. Thank you.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Trust Your Data: How to Efficiently Filter Spam, Bots, & Other Junk Traffic in Google Analytics

Posted by Carlosesal

There is no doubt that Google Analytics is one of the most important tools you could use to understand your users’ behavior and measure the performance of your site. There’s a reason it’s used by millions across the world.

But despite being such an essential part of the decision-making process for many businesses and blogs, I often find sites (of all sizes) that do little or no data filtering after installing the tracking code, which is a huge mistake.

Think of a Google Analytics property without filtered data as one of those styrofoam cakes with edible parts. It may seem genuine from the top, and it may even feel right when you cut a slice, but as you go deeper and deeper you find that much of it is artificial.

If you’re one of those that haven’t properly configured their Google Analytics and you only pay attention to the summary reports, you probably won’t notice that there’s all sorts of bogus information mixed in with your real user data.

And as a consequence, you won’t realize that your efforts are being wasted on analyzing data that doesn’t represent the actual performance of your site.

To make sure you’re getting only the real ingredients and prevent you from eating that slice of styrofoam, I’ll show you how to use the tools that GA provides to eliminate all the artificial excess that inflates your reports and corrupts your data.

Common Google Analytics threats

As most of the people I’ve worked with know, I’ve always been obsessed with the accuracy of data, mainly because as a marketer/analyst there’s nothing worse than realizing that you’ve made a wrong decision because your data wasn’t accurate. That’s why I’m continually exploring new ways of improving it.

As a result of that research, I wrote my first Moz post about the importance of filtering in Analytics, specifically about ghost spam, which was a significant problem at that time and still is (although to a lesser extent).

While the methods described there are still quite useful, I’ve since been researching solutions for other types of Google Analytics spam and a few other threats that might not be as annoying, but that are equally or even more harmful to your Analytics.

Let’s review, one by one.

Ghosts, crawlers, and other types of spam

The GA team has done a pretty good job handling ghost spam. The amount of it has been dramatically reduced over the last year, compared to the outbreak in 2015/2017.

However, the millions of current users and the thousands of new, unaware users that join every day, plus the majority’s curiosity to discover why someone is linking to their site, make Google Analytics too attractive a target for the spammers to just leave it alone.

The same logic can be applied to any widely used tool: no matter what security measures it has, there will always be people trying to abuse its reach for their own interest. Thus, it’s wise to add an extra security layer.

Take, for example, the most popular CMS: WordPress. Despite having some built-in security measures, if you don’t take additional steps to protect it (like setting a strong username and password or installing a security plugin), you run the risk of being hacked.

The same happens to Google Analytics, but instead of plugins, you use filters to protect it.

In which reports can you look for spam?

Spam traffic will usually show as a Referral, but it can appear in any part of your reports, even in unsuspecting places like a language or page title.

Sometimes spammers will try to fool by using misleading URLs that are very similar to known websites, or they may try to get your attention by using unusual characters and emojis in the source name.

Independently of the type of spam, there are 3 things you always should do when you think you found one in your reports:

  1. Never visit the suspicious URL. Most of the time they’ll try to sell you something or promote their service, but some spammers might have some malicious scripts on their site.
  2. This goes without saying, but never install scripts from unknown sites; if for some reason you did, remove it immediately and scan your site for malware.
  3. Filter out the spam in your Google Analytics to keep your data clean (more on that below).

If you’re not sure whether an entry on your report is real, try searching for the URL in quotes (“example.com”). Your browser won’t open the site, but instead will show you the search results; if it is spam, you’ll usually see posts or forums complaining about it.

If you still can’t find information about that particular entry, give me a shout — I might have some knowledge for you.

Bot traffic

A bot is a piece of software that runs automated scripts over the Internet for different purposes.

There are all kinds of bots. Some have good intentions, like the bots used to check copyrighted content or the ones that index your site for search engines, and others not so much, like the ones scraping your content to clone it.

2016 bot traffic report. Source: Incapsula

In either case, this type of traffic is not useful for your reporting and might be even more damaging than spam both because of the amount and because it’s harder to identify (and therefore to filter it out).

It’s worth mentioning that bots can be blocked from your server to stop them from accessing your site completely, but this usually involves editing sensible files that require high technical knowledge, and as I said before, there are good bots too.

So, unless you’re receiving a direct attack that’s skewing your resources, I recommend you just filter them in Google Analytics.

In which reports can you look for bot traffic?

Bots will usually show as Direct traffic in Google Analytics, so you’ll need to look for patterns in other dimensions to be able to filter it out. For example, large companies that use bots to navigate the Internet will usually have a unique service provider.

I’ll go into more detail on this below.

Internal traffic

Most users get worried and anxious about spam, which is normal — nobody likes weird URLs showing up in their reports. However, spam isn’t the biggest threat to your Google Analytics.

You are!

The traffic generated by people (and bots) working on the site is often overlooked despite the huge negative impact it has. The main reason it’s so damaging is that in contrast to spam, internal traffic is difficult to identify once it hits your Analytics, and it can easily get mixed in with your real user data.

There are different types of internal traffic and different ways of dealing with it.

Direct internal traffic

Testers, developers, marketing team, support, outsourcing… the list goes on. Any member of the team that visits the company website or blog for any purpose could be contributing.

In which reports can you look for direct internal traffic?

Unless your company uses a private ISP domain, this traffic is tough to identify once it hits you, and will usually show as Direct in Google Analytics.

Third-party sites/tools

This type of internal traffic includes traffic generated directly by you or your team when using tools to work on the site; for example, management tools like Trello or Asana,

It also considers traffic coming from bots doing automatic work for you; for example, services used to monitor the performance of your site, like Pingdom or GTmetrix.

Some types of tools you should consider:

  • Project management
  • Social media management
  • Performance/uptime monitoring services
  • SEO tools
In which reports can you look for internal third-party tools traffic?

This traffic will usually show as Referral in Google Analytics.

Development/staging environments

Some websites use a test environment to make changes before applying them to the main site. Normally, these staging environments have the same tracking code as the production site, so if you don’t filter it out, all the testing will be recorded in Google Analytics.

In which reports can you look for development/staging environments?

This traffic will usually show as Direct in Google Analytics, but you can find it under its own hostname (more on this later).

Web archive sites and cache services

Archive sites like the Wayback Machine offer historical views of websites. The reason you can see those visits on your Analytics — even if they are not hosted on your site — is that the tracking code was installed on your site when the Wayback Machine bot copied your content to its archive.

One thing is for certain: when someone goes to check how your site looked in 2015, they don’t have any intention of buying anything from your site — they’re simply doing it out of curiosity, so this traffic is not useful.

In which reports can you look for traffic from web archive sites and cache services?

You can also identify this traffic on the hostname report.

A basic understanding of filters

The solutions described below use Google Analytics filters, so to avoid problems and confusion, you’ll need some basic understanding of how they work and check some prerequisites.

Things to consider before using filters:

1. Create an unfiltered view.

Before you do anything, it’s highly recommendable to make an unfiltered view; it will help you track the efficacy of your filters. Plus, it works as a backup in case something goes wrong.

2. Make sure you have the correct permissions.

You will need edit permissions at the account level to create filters; edit permissions at view or property level won’t work.

3. Filters don’t work retroactively.

In GA, aggregated historical data can’t be deleted, at least not permanently. That’s why the sooner you apply the filters to your data, the better.

4. The changes made by filters are permanent!

If your filter is not correctly configured because you didn’t enter the correct expression (missing relevant entries, a typo, an extra space, etc.), you run the risk of losing valuable data FOREVER; there is no way of recovering filtered data.

But don’t worry — if you follow the recommendations below, you shouldn’t have a problem.

5. Wait for it.

Most of the time you can see the effect of the filter within minutes or even seconds after applying it; however, officially it can take up to twenty-four hours, so be patient.

Types of filters

There are two main types of filters: predefined and custom.

Predefined filters are very limited, so I rarely use them. I prefer to use the custom ones because they allow regular expressions, which makes them a lot more flexible.

Within the custom filters, there are five types: exclude, include, lowercase/uppercase, search and replace, and advanced.

Here we will use the first two: exclude and include. We’ll save the rest for another occasion.

Essentials of regular expressions

If you already know how to work with regular expressions, you can jump to the next section.

REGEX (short for regular expressions) are text strings prepared to match patterns with the use of some special characters. These characters help match multiple entries in a single filter.

Don’t worry if you don’t know anything about them. We will use only the basics, and for some filters, you will just have to COPY-PASTE the expressions I pre-built.

REGEX special characters

There are many special characters in REGEX, but for basic GA expressions we can focus on three:

  • ^ The caret: used to indicate the beginning of a pattern,
  • $ The dollar sign: used to indicate the end of a pattern,
  • | The pipe or bar: means “OR,” and it is used to indicate that you are starting a new pattern.

When using the pipe character, you should never ever:

  • Put it at the beginning of the expression,
  • Put it at the end of the expression,
  • Put 2 or more together.

Any of those will mess up your filter and probably your Analytics.

A simple example of REGEX usage

Let’s say I go to a restaurant that has an automatic machine that makes fruit salad, and to choose the fruit, you should use regular expressions.

This super machine has the following fruits to choose from: strawberry, orange, blueberry, apple, pineapple, and watermelon.

To make a salad with my favorite fruits (strawberry, blueberry, apple, and watermelon), I have to create a REGEX that matches all of them. Easy! Since the pipe character “|” means OR I could do this:

  • REGEX 1: strawberry|blueberry|apple|watermelon

The problem with that expression is that REGEX also considers partial matches, and since pineapple also contains “apple,” it would be selected as well… and I don’t like pineapple!

To avoid that, I can use the other two special characters I mentioned before to make an exact match for apple. The caret “^” (begins here) and the dollar sign “$ ” (ends here). It will look like this:

  • REGEX 2: strawberry|blueberry|^apple$ |watermelon

The expression will select precisely the fruits I want.

But let’s say for demonstration’s sake that the fewer characters you use, the cheaper the salad will be. To optimize the expression, I can use the ability for partial matches in REGEX.

Since strawberry and blueberry both contain “berry,” and no other fruit in the list does, I can rewrite my expression like this:

  • Optimized REGEX: berry|^apple$ |watermelon

That’s it — now I can get my fruit salad with the right ingredients, and at a lower price.

3 ways of testing your filter expression

As I mentioned before, filter changes are permanent, so you have to make sure your filters and REGEX are correct. There are 3 ways of testing them:

  • Right from the filter window; just click on “Verify this filter,” quick and easy. However, it’s not the most accurate since it only takes a small sample of data.

  • Using an online REGEX tester; very accurate and colorful, you can also learn a lot from these, since they show you exactly the matching parts and give you a brief explanation of why.

  • Using an in-table temporary filter in GA; you can test your filter against all your historical data. This is the most precise way of making sure you don’t miss anything.

If you’re doing a simple filter or you have plenty of experience, you can use the built-in filter verification. However, if you want to be 100% sure that your REGEX is ok, I recommend you build the expression on the online tester and then recheck it using an in-table filter.

Quick REGEX challenge

Here’s a small exercise to get you started. Go to this premade example with the optimized expression from the fruit salad case and test the first 2 REGEX I made. You’ll see live how the expressions impact the list.

Now make your own expression to pay as little as possible for the salad.

Remember:

  • We only want strawberry, blueberry, apple, and watermelon;
  • The fewer characters you use, the less you pay;
  • You can do small partial matches, as long as they don’t include the forbidden fruits.

Tip: You can do it with as few as 6 characters.

Now that you know the basics of REGEX, we can continue with the filters below. But I encourage you to put “learn more about REGEX” on your to-do list — they can be incredibly useful not only for GA, but for many tools that allow them.

How to create filters to stop spam, bots, and internal traffic in Google Analytics

Back to our main event: the filters!

Where to start: To avoid being repetitive when describing the filters below, here are the standard steps you need to follow to create them:

  1. Go to the admin section in your Google Analytics (the gear icon at the bottom left corner),
  2. Under the View column (master view), click the button “Filters” (don’t click on “All filters“ in the Account column):
  3. Click the red button “+Add Filter” (if you don’t see it or you can only apply/remove already created filters, then you don’t have edit permissions at the account level. Ask your admin to create them or give you the permissions.):
  4. Then follow the specific configuration for each of the filters below.

The filter window is your best partner for improving the quality of your Analytics data, so it will be a good idea to get familiar with it.

Valid hostname filter (ghost spam, dev environments)

Prevents traffic from:

  • Ghost spam
  • Development hostnames
  • Scraping sites
  • Cache and archive sites

This filter may be the single most effective solution against spam. In contrast with other commonly shared solutions, the hostname filter is preventative, and it rarely needs to be updated.

Ghost spam earns its name because it never really visits your site. It’s sent directly to the Google Analytics servers using a feature called Measurement Protocol, a tool that under normal circumstances allows tracking from devices that you wouldn’t imagine that could be traced, like coffee machines or refrigerators.

Real users pass through your server, then the data is sent to GA; hence it leaves valid information. Ghost spam is sent directly to GA servers, without knowing your site URL; therefore all data left is fake. Source: carloseo.com

The spammer abuses this feature to simulate visits to your site, most likely using automated scripts to send traffic to randomly generated tracking codes (UA-0000000-1).

Since these hits are random, the spammers don’t know who they’re hitting; for that reason ghost spam will always leave a fake or (not set) host. Using that logic, by creating a filter that only includes valid hostnames all ghost spam will be left out.

Where to find your hostnames

Now here comes the “tricky” part. To create this filter, you will need, to make a list of your valid hostnames.

A list of what!?

Essentially, a hostname is any place where your GA tracking code is present. You can get this information from the hostname report:

  • Go to Audience > Select Network > At the top of the table change the primary dimension to Hostname.

If your Analytics is active, you should see at least one: your domain name. If you see more, scan through them and make a list of all the ones that are valid for you.

Types of hostname you can find

The good ones:

Type

Example

Your domain and subdomains

yourdomain.com

Tools connected to your Analytics

YouTube, MailChimp

Payment gateways

Shopify, booking systems

Translation services

Google Translate

Mobile speed-up services

Google weblight

The bad ones (by bad, I mean not useful for your reports):

Type

Example/Description

Staging/development environments

staging.yourdomain.com

Internet archive sites

web.archive.org

Scraping sites that don’t bother to trim the content

The URL of the scraper

Spam

Most of the time they will show their URL, but sometimes they may use the name of a known website to try to fool you. If you see a URL that you don’t recognize, just think, “do I manage it?” If the answer is no, then it isn’t your hostname.

(not set) hostname

It usually comes from spam. On rare occasions it’s related to tracking code issues.

Below is an example of my hostname report. From the unfiltered view, of course, the master view is squeaky clean.

Now with the list of your good hostnames, make a regular expression. If you only have your domain, then that is your expression; if you have more, create an expression with all of them as we did in the fruit salad example:

Hostname REGEX (example)


yourdomain.com|hostname2|hostname3|hostname4

Important! You cannot create more than one “Include hostname filter”; if you do, you will exclude all data. So try to fit all your hostnames into one expression (you have 255 characters).

The “valid hostname filter” configuration:

  • Filter Name: Include valid hostnames
  • Filter Type: Custom > Include
  • Filter Field: Hostname
  • Filter Pattern: [hostname REGEX you created]

Campaign source filter (Crawler spam, internal sources)

Prevents traffic from:

  • Crawler spam
  • Internal third-party tools (Trello, Asana, Pingdom)

Important note: Even if these hits are shown as a referral, the field you should use in the filter is “Campaign source” — the field “Referral” won’t work.

Filter for crawler spam

The second most common type of spam is crawler. They also pretend to be a valid visit by leaving a fake source URL, but in contrast with ghost spam, these do access your site. Therefore, they leave a correct hostname.

You will need to create an expression the same way as the hostname filter, but this time, you will put together the source/URLs of the spammy traffic. The difference is that you can create multiple exclude filters.

Crawler REGEX (example)


spam1|spam2|spam3|spam4

Crawler REGEX (pre-built)


As I promised, here are latest pre-built crawler expressions that you just need to copy/paste.

The “crawler spam filter” configuration:

  • Filter Name: Exclude crawler spam 1
  • Filter Type: Custom > Exclude
  • Filter Field: Campaign source
  • Filter Pattern: [crawler REGEX]

Filter for internal third-party tools

Although you can combine your crawler spam filter with internal third-party tools, I like to have them separated, to keep them organized and more accessible for updates.

The “internal tools filter” configuration:

  • Filter Name: Exclude internal tool sources
  • Filter Pattern: [tool source REGEX]

Internal Tools REGEX (example)


trello|asana|redmine

In case, that one of the tools that you use internally also sends you traffic from real visitors, don’t filter it. Instead, use the “Exclude Internal URL Query” below.

For example, I use Trello, but since I share analytics guides on my site, some people link them from their Trello accounts.

Filters for language spam and other types of spam

The previous two filters will stop most of the spam; however, some spammers use different methods to bypass the previous solutions.

For example, they try to confuse you by showing one of your valid hostnames combined with a well-known source like Apple, Google, or Moz. Even my site has been a target (not saying that everyone knows my site; it just looks like the spammers don’t agree with my guides).

However, even if the source and host look fine, the spammer injects their message in another part of your reports like the keyword, page title, and even as a language.

In those cases, you will have to take the dimension/report where you find the spam and choose that name in the filter. It’s important to consider that the name of the report doesn’t always match the name in the filter field:

Report name

Filter field

Language

Language settings

Referral

Campaign source

Organic Keyword

Search term

Service Provider

ISP Organization

Network Domain

ISP Domain

Here are a couple of examples.

The “language spam/bot filter” configuration:

  • Filter Name: Exclude language spam
  • Filter Type: Custom > Exclude
  • Filter Field: Language settings
  • Filter Pattern: [Language REGEX]

Language Spam REGEX (Prebuilt)


\s[^\s]*\s|.{15,}|\.|,|^c$

The expression above excludes fake languages that don’t meet the required format. For example, take these weird messages appearing instead of regular languages like en-us or es-es:

Examples of language spam

The organic/keyword spam filter configuration:

  • Filter Name: Exclude organic spam
  • Filter Type: Custom > Exclude
  • Filter Field: Search term
  • Filter Pattern: [keyword REGEX]

Filters for direct bot traffic

Bot traffic is a little trickier to filter because it doesn’t leave a source like spam, but it can still be filtered with a bit of patience.

The first thing you should do is enable bot filtering. In my opinion, it should be enabled by default.

Go to the Admin section of your Analytics and click on View Settings. You will find the option “Exclude all hits from known bots and spiders” below the currency selector:

It would be wonderful if this would take care of every bot — a dream come true. However, there’s a catch: the key here is the word “known.” This option only takes care of known bots included in the “IAB known bots and spiders list.” That’s a good start, but far from enough.

There are a lot of “unknown” bots out there that are not included in that list, so you’ll have to play detective and search for patterns of direct bot traffic through different reports until you find something that can be safely filtered without risking your real user data.

To start your bot trail search, click on the Segment box at the top of any report, and select the “Direct traffic” segment.

Then navigate through different reports to see if you find anything suspicious.

Some reports to start with:

  • Service provider
  • Browser version
  • Network domain
  • Screen resolution
  • Flash version
  • Country/City

Signs of bot traffic

Although bots are hard to detect, there are some signals you can follow:

  • An unnatural increase of direct traffic
  • Old versions (browsers, OS, Flash)
  • They visit the home page only (usually represented by a slash “/” in GA)
  • Extreme metrics:
    • Bounce rate close to 100%,
    • Session time close to 0 seconds,
    • 1 page per session,
    • 100% new users.

Important! If you find traffic that checks off many of these signals, it is likely bot traffic. However, not all entries with these characteristics are bots, and not all bots match these patterns, so be cautious.

Perhaps the most useful report that has helped me identify bot traffic is the “Service Provider” report. Large corporations frequently use their own Internet service provider name.

I also have a pre-built expression for ISP bots, similar to the crawler expressions.

The bot ISP filter configuration:

  • Filter Name: Exclude bots by ISP
  • Filter Type: Custom > Exclude
  • Filter Field: ISP organization
  • Filter Pattern: [ISP provider REGEX]

ISP provider bots REGEX (prebuilt)


hubspot|^google\sllc$ |^google\sinc\.$ |alibaba\.com\sllc|ovh\shosting\sinc\.

Latest ISP bot expression

IP filter for internal traffic

We already covered different types of internal traffic, the one from test sites (with the hostname filter), and the one from third-party tools (with the campaign source filter).

Now it’s time to look at the most common and damaging of all: the traffic generated directly by you or any member of your team while working on any task for the site.

To deal with this, the standard solution is to create a filter that excludes the public IP (not private) of all locations used to work on the site.

Examples of places/people that should be filtered

  • Office
  • Support
  • Home
  • Developers
  • Hotel
  • Coffee shop
  • Bar
  • Mall
  • Any place that is regularly used to work on your site

To find the public IP of the location you are working at, simply search for “my IP” in Google. You will see one of these versions:

IP version

Example

Short IPv4

1.23.45.678

Long IPv6

2001:0db8:85a3:0000:0000:8a2e:0370:7334

No matter which version you see, make a list with the IP of each place and put them together with a REGEX, the same way we did with other filters.

  • IP address expression: IP1|IP2|IP3|IP4 and so on.

The static IP filter configuration:

  • Filter Name: Exclude internal traffic (IP)
  • Filter Type: Custom > Exclude
  • Filter Field: IP Address
  • Filter Pattern: [The IP expression]

Cases when this filter won’t be optimal:

There are some cases in which the IP filter won’t be as efficient as it used to be:

  • You use IP anonymization (required by the GDPR regulation). When you anonymize the IP in GA, the last part of the IP is changed to 0. This means that if you have 1.23.45.678, GA will pass it as 1.23.45.0, so you need to put it like that in your filter. The problem is that you might be excluding other IPs that are not yours.
  • Your Internet provider changes your IP frequently (Dynamic IP). This has become a common issue lately, especially if you have the long version (IPv6).
  • Your team works from multiple locations. The way of working is changing — now, not all companies operate from a central office. It’s often the case that some will work from home, others from the train, in a coffee shop, etc. You can still filter those places; however, maintaining the list of IPs to exclude can be a nightmare,
  • You or your team travel frequently. Similar to the previous scenario, if you or your team travels constantly, there’s no way you can keep up with the IP filters.

If you check one or more of these scenarios, then this filter is not optimal for you; I recommend you to try the “Advanced internal URL query filter” below.

URL query filter for internal traffic

If there are dozens or hundreds of employees in the company, it’s extremely difficult to exclude them when they’re traveling, accessing the site from their personal locations, or mobile networks.

Here’s where the URL query comes to the rescue. To use this filter you just need to add a query parameter. I add “?internal” to any link your team uses to access your site:

  • Internal newsletters
  • Management tools (Trello, Redmine)
  • Emails to colleagues
  • Also works by directly adding it in the browser address bar

Basic internal URL query filter

The basic version of this solution is to create a filter to exclude any URL that contains the query “?internal”.

  • Filter Name: Exclude Internal Traffic (URL Query)
  • Filter Type: Custom > Exclude
  • Filter Field: Request URI
  • Filter Pattern: \?internal

This solution is perfect for instances were the user will most likely stay on the landing page, for example, when sending a newsletter to all employees to check a new post.

If the user will likely visit more than the landing page, then the subsequent pages will be recorded.

Advanced internal URL query filter

This solution is the champion of all internal traffic filters!

It’s a more comprehensive version of the previous solution and works by filtering internal traffic dynamically using Google Tag Manager, a GA custom dimension, and cookies.

Although this solution is a bit more complicated to set up, once it’s in place:

  • It doesn’t need maintenance
  • Any team member can use it, no need to explain techy stuff
  • Can be used from any location
  • Can be used from any device, and any browser

To activate the filter, you just have to add the text “?internal” to any URL of the website.

That will insert a small cookie in the browser that will tell GA not to record the visits from that browser.

And the best of it is that the cookie will stay there for a year (unless it is manually removed), so the user doesn’t have to add “?internal” every time.

Bonus filter: Include only internal traffic

In some occasions, it’s interesting to know the traffic generated internally by employees — maybe because you want to measure the success of an internal campaign or just because you’re a curious person.

In that case, you should create an additional view, call it “Internal Traffic Only,” and use one of the internal filters above. Just one! Because if you have multiple include filters, the hit will need to match all of them to be counted.

If you configured the “Advanced internal URL query” filter, use that one. If not, choose one of the others.

The configuration is exactly the same — you only need to change “Exclude” for “Include.”

Cleaning historical data

The filters will prevent future hits from junk traffic.

But what about past affected data?

I know I told you that deleting aggregated historical data is not possible in GA. However, there’s still a way to temporarily clean up at least some of the nasty traffic that has already polluted your reports.

For this, we’ll use an advanced segment (a subset of your Analytics data). There are built-in segments like “Organic” or “Mobile,” but you can also build one using your own set of rules.

To clean our historical data, we will build a segment using all the expressions from the filters above as conditions (except the ones from the IP filter, because IPs are not stored in GA; hence, they can’t be segmented).

To help you get started, you can import this segment template.

You just need to follow the instructions on that page and replace the placeholders. Here is how it looks:

In the actual template, all text is black; the colors are just to help you visualize the conditions.

After importing it, to select the segment:

  1. Click on the box that says “All users” at the top of any of your reports
  2. From your list of segments, check the one that says “0. All Users – Clean”
  3. Lastly, uncheck the “All Users”

Now you can navigate through your reaports and all the junk traffic included in the segment will be removed.

A few things to consider when using this segment:

  • Segments have to be selected each time. A way of having it selected by default is by adding a bookmark when the segment is selected.
  • You can remove or add conditions if you need to.
  • You can edit the segment at any time to update it or add conditions (open the list of segments, then click “Actions” then “Edit”).

  • The hostname expression and third-party tools expression are different for each site.
  • If your site has a large volume of traffic, segments may sample your data when selected, so if you see the little shield icon at the top of your reports go yellow (normally is green), try choosing a shorter period (i.e. 1 year, 6 months, one month).

Conclusion: Which cake would you eat?

Having real and accurate data is essential for your Google Analytics to report as you would expect.

But if you haven’t filtered it properly, it’s almost certain that it will be filled with all sorts of junk and artificial information.

And the worst part is that if don’t realize that your reports contain bogus data, you will likely make wrong or poor decisions when deciding on the next steps for your site or business.

The filters I share above will help you prevent the three most harmful threats that are polluting your Google Analytics and don’t let you get a clear view of the actual performance of your site: spam, bots, and internal traffic.

Once these filters are in place, you can rest assured that your efforts (and money!) won’t be wasted on analyzing deceptive Google Analytics data, and your decisions will be based on solid information.

And the benefits don’t stop there. If you’re using other tools that import data from GA, for example, WordPress plugins like GADWP, excel add-ins like AnalyticsEdge, or SEO suites like Moz Pro, the benefits will trickle down to all of them as well.

Besides highlighting the importance of the filters in GA (which I hope I made clear by now), I would also love for the preparation of these filters to inspire the curiosity and basis to create others that will allow you to do all sorts of remarkable things with your data.

Remember, filters not only allow you to keep away junk, you can also use them to rearrange your real user information — but more on that on another occasion.


That’s it! I hope these tips help you make more sense of your data and make accurate decisions.

Have any questions, feedback, experiences? Let me know in the comments, or reach me on Twitter @carlosesal.

Complementary resources:

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in IM NewsComments Off

Google Chrome, Mozilla Firefox Leaked Facebook User Data Caused by Browser Vulnerability

Google Chrome and Mozilla Firefox might have inadvertently leaked the Facebook usernames, profile pictures and even the likes of their users because of a side-channel vulnerability.

A side-channel vulnerability was discovered in a CSS3 feature dubbed the “mix-blend-mode.” This allowed a hacker to discover the identity of a Facebook account holder using Chrome or Firefox by getting them to visit a specially-designed website.

This critical flaw was discovered in 2017 by security researchers Dario Weißer and Ruslan Habalov and also by independent researcher Max May.

The researchers created a proof-of-concept (POC) exploit to show how the vulnerability could be misused. Weißer and Habalov’s concept showed how they were able to visually harvest data like username, profile picture, and “like” status of a user. What’s more, this insidious hack could be accomplished in the background when the user visits a malicious website.

The visual leak could happen on sites using iFrames that connect to Facebook in via login buttons and social plugins. Due to a security feature called the “same-origin policy,” sites can’t directly access iFrame content. But the researchers were able to get the information by developing an overlay on the cross-origin iFrame in order to work with the underlying pixels.

It took Habalov and Weißer’s POC about 20 seconds to get the username and about five minutes to create a vague copy of the profile picture. The program also took about 500 milliseconds to check the “like” status. Keep in mind, however, that for this vulnerability to work, the user should be logged into their Facebook account.

Habalov and Weißer privately notified both Google and Mozilla and steps were taken to contain the threat. Google was able to fix the flaw on their end when version 63 was released last December. On Firefox’s end, a patch was made available 14 days ago with the release of the browser’s version 60. The delay was due to the researchers’ late disclosure of their findings to Mozilla.

IE and Edge browsers weren’t exposed to the side-channel exploit as they don’t support the needed feature. Safari was also safe from the flaw.

[Featured image via Pixabay]

The post Google Chrome, Mozilla Firefox Leaked Facebook User Data Caused by Browser Vulnerability appeared first on WebProNews.


WebProNews

Posted in IM NewsComments Off

Advert