Skip to content

Is Google indexing pages from Twitter and messing with your analytics?

I just Googled for “WordPress RC” to find the release notes for the 3.5 Release Candidate.  I clicked on the result for wordpress.org and was taken to the correct page, nothing out of the ordinary.  I then copied the URL to share in team chat and noticed that the URL was quite long; there were some query string parameters.  The complete URL was:

http://wordpress.org/news/2012/11/wordpress-3-5-release-candidate/?utm_source=twitterfeed&utm_medium=twitter

The utm_source and utm_medium parameters are used by Google Analytics to segment traffic by source.  Normally you would expect to end up on this URL if you clicked from the WordPress Twitter feed.

From that, we can infer that Google might be indexing URLs from Tweets.  It’s also possible that somebody clicked on the link from the Twitter feed, copied the URL and linked to it from another website.  Somebody somewhere has probably done a test to determine this.

However, the more interesting corollary  is that this is probably messing with your web analytics.  If those querystring parameters are present when the page is loaded, Google Analytics will use them to tag the source of traffic and it will therefore incorrectly attribute search traffic as traffic from other sources or marketing campaigns.

If you’re using utm variables for campaign tracking, it’s probably worth testing to see exactly what’s going on here.  It’s also a good idea to use the canonical meta tag to tell Google exactly what URL to index for a page.