This morning, we deployed v5.5. of the platform, which includes the rollout of a new IntelliTraffic™ algorithm which we will be using to restore your keyword visit data. This represents the end (hopefully) of a long project we began in the autumn of 2013 to help our customers deal with the whole ‘not provided’ issue.
TL:DR: we’ve launched a new IntelliTraffic™ algorithm which calculates organic keyword visit data for your site, using a variety of data sources. This is a long blog post, which explains the history, our research, possible alternative solutions and our new approach to solving the ‘not provided’ problem.
However, if you just want to try it, then simply click on the IntelliTraffic™ link you’ll see above some tables in the platform (or use the new Quicklinks link – ‘Configure IntelliTraffic options’) and just turn it on! More detailed instructions can be found here.
(not provided): a brief history
For those of you who have been asleep for the past two and a half years or have been otherwise occupied, I’ll briefly recap:
- October, 2011 – Google announced support for secure search (SSL) for those searchers who are logged into their Google accounts, meaning that this keyword visit data was encrypted and began to show in Google Analytics accounts as ‘not provided’, instead of showing visits data for each organic keyword.
- October 2011-September 2013 – Google Analytics users (and SEOs) began to see a steady increase in the level of inaccessible keyword data in their Google Analytics accounts, as more and more data began to show as ‘not provided’.
- September 2013 – Google switched secure search on by default, meaning all searches conducted through Google now show in your browser bar as https:// (not http://) and the keyword data in Google Analytics really began to dry up even quicker than before.
- Following complaints that Google was still providing keyword visit data to their PPC customers, Google announced that it would restrict this data too.
I realize that this is a somewhat oversimplification of developments and there were lots of minor changes in the intervening periods, involving, among other things, changes to browsers and the whole issue may also have had some connection to the Edward Snowden affair and court cases Google has been involved in over the past few years, but these were the crucial milestones.
(not provided): our research
In October 2012, a year after Google’s initial announcement, we analyzed 1000s of sites on the platform to see how the percentage of ‘not provided’ traffic had grown. At that time, it was averaging around 20%, but it looked like it was would be around 40% by October 2013 (around the time of its 2 year anniversary). Obviously, the announcement Google made in September 2013 had a significant impact on this trend and sure enough, we saw a steep increase in the average ‘not provided’ level when we revised the ‘not provided’ research around that time. It wasn’t quite 100% and wouldn’t be for most sites, as Google Analytics, of course, does show non-Google traffic sources too.
Since that time, we have been hard at work at coming up with a way to help restore that data as best we could.
In the light of today’s deployment, we revised the research once again. The overall average ‘not provided’ figure is now at 84% (based on a study of ~4,000 websites), with the majority of sites (60%) having a ‘not provided’ level of 80% or more. As you can see from the chart below, there were still some anomalies – a few sites (2.9%) with less than 5% NP traffic and only 2.3% of sites showing exactly 100% in NP traffic. This is perhaps to be expected, as some sites will be very small and have low traffic levels and some sites may depend entirely on traffic from Google.
not provided: our solution
Obviously, losing visit data for each keyword was going to be an inconvenience for SEOs or any webmasters wanting to analyze their organic traffic and, of course, the bigger the site, the more inconvenient it was proving to be. Of the subset of data we used for the above analysis, one of the sites had 30 day organic traffic of almost 5m, 82% of which was showing up as ‘not provided’. Not knowing what keywords are generating 3.3m visits each month is a pretty big headache to have.
There have been quite a few posts from the SEO community since October 2011, providing suggestions as to how you can either get around this obstacle or replace this data using some educated guesswork.
Option 1: Using Google Analytics data (what data you were left with, that is)
Clearly, in the early days of ‘not provided’, you could take the keyword data you did have in Google Analytics and extrapolate that out to fill the gap left by your ‘not provided’ percentage, if it was only hovering around, say, the 20% or 30% mark. However, as that percentage continued to grow, it became a bigger and bigger assumption that just because you had a tiny bit of keyword data left in GA that told you you had received 10 visits for the keyword “blue widget” in the last 30 days, you could then assume that as this amounted to 10% of the data you could examine, then of the 10,000 visits showing up as ‘not provided’, well, 1,000 of those must also be for “blue widget” too, right? Well, of course, you could work this way, but you’d be making large assumptions about the way in which your website generated traffic.
Some SEOs would simply look at branded and non-branded traffic as a whole and make a similar calculation. For example, if the keyword data you can see was showing as 65% was from branded terms and 35% was for non-branded terms, therefore you could simply apply the same ratios to the number of ‘not provided’ visits. At least this would give you some idea of the breakdown of visits, but you’d still be missing that useful keyword-specific visit data. And, furthermore, as the GA data began to dry up (as your not provided percentage continued to increase), this data source would, at some point, become fairly useless for calculating keyword visit data.
Option 2: Using Ranking Data + CTR assumptions + Search Volumes
A lot of the suggestions SEOs made revolved around using search volumes and assumptions about CTR (click through rates) for different ranking positions in order to estimate the amount of visits you might have had for each keyword over a set period. Of course, there are some potential issues with using solelythis approach:
- It assumes CTRs are consistent across millions of different SERPs
- It might ignore the effect of Universal Search
- It might be based on outdated CTR data
- The Search Volume data might not be “accurate”
- It ignores the difference you should get between branded and non-branded CTRs as you’d probably expect higher CTRs for the very top results for branded searches, since they indicate that users are actively searching for one of your web properties
Option 3: Google Webmaster Tools
A lot of SEOs pointed to the data still available in GWT (Google Webmaster Tools) as this would allow you to see an estimate of the number of visits each keyword had generated, together with the ranking URL. However, relying on this alone had some potential pitfalls:
- It was limited to 2,000 keywords
- The data was only visible for the previous two months
- Some SEOs simply didn’t trust it anyway
- Low trafficked keywords only gave estimates (“<10″) , although this was later updated by Google to give more specific numbers
So GWT would only really give you a limited view of your keyword data.
Option 4: Landing Page Focus
You could, of course, as a lot of SEOs pointed out, just look at your most popular landing pages and work out (based on what those keywords pages are targeting) what keywords might have generated those organic visits for the page this month. This certainly made a lot of sense, but still involved a lot of manual work and assumptions and we quickly realized that our solution would allow you to do exactly this anyway!
Each potential “solution” provided SEOs with some insights, but each had its potential issues. What if you could combine all three data sources, though, and let an algorithm work out the best option for each site and date range? That’s what we decided to do (even though it meant the algorithm would need to be sophisticated). 3 data sources is always going to be more useful than one, right?
So, we now give our users the option of using one or more data sources (or all three) and the algorithm works out what’s best in the circumstances. It’s even clever enough to check landing page visit numbers so that it doesn’t “overestimate” the numbers.
So why would you want to use it?
Well, let me put it this way:
- It’s (almost completely) automated – just configure it and wait for it to update
- It’s configurable - we know SEOs are always arguing over things like CTRs
- It can be used for competitive research – you can put any domain in and we can work off ranking data, CTRs and Search Volume to give you an idea of what keywords your competitors are attracting traffic for
- It saves time in carrying out manual calculations and removes a large amount of guesswork
- We will store the data for you - GWT won’t, so you’ll have to remember to download it every month or so
- We can go back in time!!! – we can backdate your Google Analytics imports, go back and re-analyse them all and, if we have ranking data for that period (or you send us some ranking histories we can import), we can then estimate your keyword visits for each keyword over this period
It’s a pretty simple process. You will need to make use of the new option in the Quicklinks menu or click on the IntelliTraffic™ link where you see it in the platform. This will take you to this configuration screen:
What you can do here is decide whether you want the IntelliTraffic™ algorithm to use each or every possible data source. If you configure GWT access, we’ll use this by default as it will provide the most detailed data.
If you’re using ranking and CTR as a data source, you can also have the platform use more than just your primary search engine as a data source. You can also then specify your own CTR percentages for brand and non-brand keywords.
The result is that, once you’ve saved these settings and we’ve done some background calculations, you’ll see the data in a few areas transformed. Firstly, and most obviously, you’ll notice that a chart appears in the Keyword Visits module which will show you the number of visits for the period you select for each keyword group:
It would have been pretty messy providing a chart which showed visit data for every keyword, but we think using keyword groups here is quite useful and you also get the keyword specific data appearing in the table below it:
Beyond this module, you’ll also see a difference in the Monitored Keywords module where the Organic visits number for each keyword will now be based on these IntelliTraffic™ calculations:
…and in the detailed Monitored Keywords task table, you’ll see an updated Organic Visits column. Very handy if you then simply want to filter by a ranking URL to get an idea of the number of organic visits each keyword is generating for a particular page:
and it’s also reflected in every other area where we use keyword specific visit data (and not just top line numbers from Google Analytics); for example, the Organic Visits module will now reflect more accurate brand and non branded visit numbers over time:
No more guesswork… no more ‘finger in the air assumptions’… let the algo do the work for you in helping to show what Google isn’t!
Feel free to add a comment below. Do you like what we’ve created? Should we be taking into account anything else? We’d love to hear your thoughts.
p.s. If you cannot enable IntelliTraffic™, this means it is not enabled for your company. Please contact your account manager or email email@example.com to get this enabled (this may require an upgrade to your account).