How to prevent Google Analytics sampling data

We all know the horrible sign in Google Analytics that looks like this:

You want to analyze a bunch of data but because of that sign you know it will be sampled (incomplete) data. So, what is "sampling" exactly and how can you prevent it.

What is sampling

After Google Analytics gathered the raw unaggregated data that is being tracked by the tracking script it processes it to understandable and useful visit data. And with that visit data all available standard reports are pre-calculated and stored. That means for example that if you try to get the "Top Content" report, Google Analytics can show it to you in seconds because most of the calculations are already done.

But if you want to get some data that is not pre-calculated, and Google Analytics has to search for it in the stored visit data, you could hit the sampling trigger. As you can imagine, this is a very heavy process that costs a lot of resources. Google has decided that if you are searching for data in more than 500.000 visits or 1.000.000 lines of data it will sample the data to save time. That means that only 500.000 visits or 1.000.000 dimensions are used to create the report you're asking for, and Google Analytics will provide you with a certain range:

That range indicates the lower and upper boundary of where the truth lies with a statistical significance that has a significance level of 5%. That means it's for 95% sure the presented range is correct.

How to prevent sampling

If you really want to use unsampled data in your reports while segmenting: make sure you select a date range that has less than 500.000 visits in it. If you want to analyze a larger date range: export the numbers of the two (ore more) date ranges to Excel and combine them there.

Another solution is to create multiple profiles that track a smaller part of your site. Within those profiles you won't hit the 500.000 limit as soon as the main profile.

The other sampling threshold regarding the 1.000.000 max dimensions is not reached very often. In most cases the top content report is the first one to hit that limit. Google will only retrieve 1.000.000 URL's for a specific period. That means, 1.000.000 divided by the days in your selected date range. For example: 2 months will give you 1.000.000/60 = 16667 unique URL's. So it could be that the URL you where looking for is not in the report. The solution is to select a date range where the amount of unique URL's in less than 1.000.000.

Final word

I know, the solutions are not the ones you really want, and with huge accounts they are useless. But in many cases exporting per proper date range selection is a good solution.

Click to activate social bookmarks

 
  • Gerron Mulder

    You might also want to consider using a 3rd party application and connect it via the Google Analytics API. There are a lot of tools that can handle the heavy lifting.

    • http://andrescholten.net André

      You're absolutely right, because the API can only download 10.000 rows per request you will always get unsampled data.

      • http://www.dbi.vic.gov.au Brendan Halloran

        Hi Andre,

        I just ran a report using then Data Feed Query Explorer that had two rows of data in it and got the dreaded "This result is based on sampled data" message at the bottom next to the "Get Data" button. It looks like the API uses exactly the same sampling method as the GA UI.

        • http://andrescholten.net André

          If you queried more than 500.000 visits for those 2 rows you get sampled data also.

  • Floris

    Iemand een oplossing voor the fast access mode?

    In fast access mode klopt conversie data opeens niet meer.
    Vergelijkingen met week ervoor ook bij websites met niet te veel verkeer zijn daardoor opeens waardeloos

    http://www.google.com/support/forum/p/Google+Analytics/thread?tid=3d9cae4f16ee2577&hl=en

  • Floris

    dankje still 'they are working on it' zie ik
    http://www.google.com/analytics/status#hl=nl

    • http://andrescholten.net André

      The problem is fixed and the data recalculated. So things should be normal today or tomorrow.

  • Paul

    All,

    I agree with Brendan. The API works the same as the UI. My results show less than 250,000 visits, but each time I run the same query - I get a different result.

    Paul

    • http://andrescholten.net André

      It's not in the results, if the data you're querying contains more than 500.000 visits you will hit the sampling treshold. How many visits do you have in the period you want data from?

  • Max

    There is no need to prevent GA from sampling data :). You can use some free tools to get unsampled data like unsampler.io as explained in this video: https://www.youtube.com/watch?v=9fzWyBmHhn4

    • http://andrescholten.net/ André Scholten

      Interessting tool, my guess is they download the data through the API in small parts (per day or per hour) to get unsampled data. I will definitely have a look at it.