For a while I wanted to write this post about the Google Analytics sampling. You know, the dreaded message that appears on top of your reports:
This message shows up when you work with a dataset that contains more than 500.000 visits or more than 1.000.000 items (keywords/url's/etc). Above that Google takes a sample of all those visits to calculate the numbers for your reports. But what is acceptable? In this example Google uses 30.62% of all visits to guess what the other 70% did on my site...
We all know the horrible sign in Google Analytics that looks like this:
You want to analyze a bunch of data but because of that sign you know it will be sampled (incomplete) data. So, what is "sampling" exactly and how can you prevent it.
What is sampling
After Google Analytics gathered the raw unaggregated data that is being tracked by the tracking script it processes it to understandable and useful visit data. And with that visit data all available standard reports are pre-calculated and stored. That means for example that if you try to get the "Top Content" report, Google Analytics can show it to you in seconds because most of the calculations are already done.