Welcome back to part 2 of 3 in our analytics data validity series! For this post, we’re assuming that you’ve been looking at your data for some time now, and that you’re familiar with how your data typically trends (if you’re not, take a look at last month’s post).
Below are some questions that might help you get started identifying and investigating your data issues, followed by some example cases to demonstrate basic solutions to issues we’ve encountered in the past. By sharing these examples, we hope to provide a better understanding of the thought process needed to investigate and resolve these types of issues.
While many cases of poor data quality can be extremely case specific, you should typically begin by asking yourself the following questions:
- Have you detected a dramatic drop in your traffic volume?
- Are you seeing large swings in your Bounce Rate?
- Have you seen a sudden influx of unexpected visits?
- Are they from unknown sources or hostnames that you are unfamiliar with?
- Is there unusually high visit rates from countries you are not present in?
- Are you seeing traffic in your production reports from other environments, such as your development, staging, or QA environments?
- Are you missing traffic that you know should be there? (i.e. email campaigns, display, etc.)
This list is not intended to be exhaustive, but it can serve as a good starting point for recognizing some of the common issues we often see within analytics accounts. Once you suspect that you may have data quality issues, you should dig deeper into your available report data looking for clues in determining the root cause.
Here are a few example cases and how one might approach solving them.
A new client reached out to the team after noticing that their bounce rates were unusually low compared to industry benchmarks. We immediately checked to see when this issue started, and to see what other metrics might have shifted at the same time. It was quickly discovered that the volume of page views also doubled on the same day that the bounce rate dropped from ~40% to ~7%. A quick look into the site code revealed that they had inadvertently left the old Google Analytics tag on the site when adding a new one. This was causing many data collection issues, one of which was the artificially low reported bounce rate. We’ve also encountered similar “double counting” issues when iFrames are used on site pages or other embedded items.
In another instance, we received a request to complete a performance analysis of a recently launched email campaign. Normally, this would be a fairly straight-forward request, however, there was no evidence of the email campaign data anywhere in the account. In addition, there was no increase in the email channel, nor in the direct channel (the most likely culprit if the campaign tracking failed). Yet, we did see an increase in the “Other” channel segment. After conducting additional research, we found that the client had utilized a new naming convention for their campaign codes which did not fit their existing channel definition rules. This was quickly fixed by adding the new parameters to the client’s view filters. This is an especially important example that serves as a reminder to ensure that filters are set up properly, because most analytics tools do not allow you to alter historic data.
These are just some simple examples of issues that we’ve encountered in the past. We know that every issue is unique, but hope that these case studies will help you get started.
Now that you’ve learned how to detect issues regarding data quality, it is equally important to maintain quality to avoid future issues. Stay tuned for part 3 of this analytics series, where we will discuss the best practices and preventive measures for maintaining accurate and relevant data.