Google’s Metrics are Meaningless

By Darian Shimy, October 29th, 2009

One question I often get asked is, “Why can’t I use Google Blog Search to track my coverage?”  There are a variety of reasons one would not want to do this, most importantly, Google’s metrics are meaningless.

To demonstrate the flaws in Google’s metrics, I decided check out the blog coverage from Google Wave.  Doing a quick search in Google Blog Search revealed about 1,569,236 results.  Was this a lot of conversation?  Looking over on the left, I saw the time frame was set to anytime.  Anytime is a little ambiguous so I narrowed it down to last week, and it returned about 15,419 results.  Using a separate browser, I ran the same query and it returned about 27,085 results.  That’s a difference of 11,666 results.  How could this be?  It was from the same machine, just different browsers (one being Safari, the other Firefox). In fact, each time I hit refresh the numbers changed.

Aside from different browsers getting different results, Google has another problem: the problem of counting.  Running a query for “Apple TV” for the date rage of 9/22-9/24 returned 1,526 results.  I wanted to know if there was a spike in conversation between these days so I ran the query once for each day.  The queries returned 162, 160, and 142 for the three days: 9/22, 9/23, and 9/24 respectively.  Adding those numbers returned 464 results.  The math didn’t make sense (464 does not equal 1,526).  As it turns out there is an explanation.

The number Google provides is only an approximation based on the probability of the the search terms occurrence in blogs.  Although I was not able to get an official word from Google (I’ll update the post if I hear back from them on the matter) there is a quote from an unnamed Google employee.  It’s old, but after testing the results, it seems they haven’t done much in this area.

There are small variations in the number of results due to the fact that index updates are done at different times in different data centers. But there are much larger variations due to the fact that these are all estimates, and we just haven’t tried that hard to make the estimates precise. To figure out the number of results in the query [a OR b], we need to intersect two posting lists. But we don’t want to pay the price of intersecting all the way to the end, so we do a prefix and then extrapolate. The extrapolation is done with the help of some parameters that were carefully tuned several years ago, but haven’t been reliably updated as the index has grown and the web has changed, so sometimes the results can be off.

Bottom line, Google’s search results are not meant to be used as an analytics platform.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • Mixx
  • Google Bookmarks
  • FriendFeed
  • HackerNews
  • Live
  • Netvibes
  • Posterous
  • Technorati
  • Tumblr

To find out how Biz360 can power your insights, visit us here, or get started here. Thanks for visiting!

Tags: , ,

One Coment to “Google’s Metrics are Meaningless”

  1. Bugmenot says:

    Can the *ratio* between one set of Google hits and another not be relied upon (even though the actual numbers possibly can’t)?

    For example, an inventor wants to patent a smelly formulation to add to washing machine powder, to make your clothes smell nicer when they come out of the wash. What do they call this sort of thing in the industry?

    You Google
    “fragrance additive” and “washing machine” (to add context) and get 91 hits

    You then Google
    “perfume additive” and “washing machine” and you get 54 hits.

    Suggesting that “fragrance” is somewhat less than twice as common as “perfume” as the term used in this particular area of technology.

    Other conclusions you might draw are that “perfume” is nevertheless an acceptable term of the art, and that this is not a particularly active area of research.

    Is it not reasonable then to use Google as a rough and ready, not to mention quick way of judging such things, rather than to “Stop using this meaningless metric and make a proper argument based upon proper research instead” (as Jonathan de Boyne Pollard says)?

Leave a Comment

Get Adobe Flash playerPlugin by wpburn.com wordpress themes