« Sunken Treasure | Main | Wallstrip (continued) »
Whose Numbers Are Right?
Billions of dollars are getting spent right now on capturing Internet audiences. But we can't even measure them correctly. As Magid Abraham, CEO of Comscore - the leading Internet measurement firm (where I am an investor and board member) - points out in this article in Mediaweek, the panel based methodology that Comscore utilizes is not necessarily less accurate than server logs. Most people think that server logs cannot lie, but there are many reasons they are not any more accurate than third party measurement methodologies. Magid's article explains the reasons for this and I suggest you read it if you are interested in the reasons why that is so.
But let's look at something else. Many people rely on Alexa rankings. They are free, easily obtainable, and so they are used frequently. But they are often wrong. Why? Well for one, they can be gamed.
But in my opinion, the bigger problem is that Alexa's panel of toolbar users skews heavily toward the techie crowd. So if your service appeals to geeks, it's going to do better in Alexa than Comscore.
Here is an example. Delicious (del.icio.us) is a service well known to the readers of this blog. I love delicious and use it every day. But it skews toward the techie crowd.
Answers.com is a more mainstream service. The primary way people reach answers.com is via the "definition" link on Google. It's a decidedly mainstream service.
I have been an investor in both services so I am partial to both of them.
Here is a comparison of delicious' traffic and answers.com's traffic on Comscore:
In Comscore's panel, Answers.com has something like 10x the number of users that Delicious has.
But take a look at Alexa's numbers for the same two services.
In Alexa's panel, Delicious has more users than Answers.com.
Who is right? Well I'd put money on Comscore being more right than Alexa because I think they have a more "balanced" panel. But I would not bet on Comscore being 100% right and Alexa being 100% wrong.
As Magid says in his article:
As always, the devil is in the details. When you scrutinize the details, the answer to why these two measurements don’t always match up is…“it depends.”
So what do you do? I think triangulation is the best bet. Start with Comscore, take a look at Alexa, take a look at the server logs, and maybe another measurement firm on top of that if you can swing it. Nobody's 100% right. I think Comscore's the closest, but nobody's perfect.
Comments (24) | Posted October 10, 2006 in Venture Capital and Technology
Comments
And, related to that, how are these companies dealing with Ajaxy Web 2.0 sites?
http://jeremy.zawodny.com/blog/archives/007665.html
No clear answers yet...
Posted by: Jeremy Zawodny | Oct 10, 2006 11:02:56 PM
Ive been using Comscore for the past 3+ years, and really wish that there was a more accurate alternative.
From what I understand, Comscore's panel is surprisingly small / limited (something < 100,000 according to what Ive been told). Im sure Comscore's statiticians would argue that its statistically accurate, but Im very skeptical. While Alexa may skew towards the techie crowd, I think Comscore skews in the complete opposite direction. Therefore, I dont think that Comscore can be used with any degree of reliability or even as a directional indicator when analyzing startups, less mass market sites, etc. It might be good for big online retail sites like Walmart or Amazon, but I dont even think it accurately tracks a site like YouTube.
From what I understand, Hitwise employs a far better methodology. Apparently, they use significant amounts of ISP data and a panel that is far larger than Comscore. Based on limited experience, Ive found the answers that Hitwise produces to make far more directionally intuitive sense. However, Hitwise can only present data in terms of market share, while Comscore provides user level metrics such as Unique Users, Frequency, etc.
I think there is a real need for a new solution. Realize its not going to happen, but it would be great if all of the web analytics companies (Google Analytics, Websidestory, Omniture, Coremetrics, etc.) could contribute usage data into some type of new 3rd party site...
Posted by: New York | Oct 11, 2006 1:24:36 AM
It might be worth looking at the fact that panel based measurement has sample and from that they project tastes and preferences or web behavior of Population.They will statistically prove the sample is exact representative of population but is it really ? given
1.it doesn't include usage from college dorms, government offices, the military, school or university offices, or mobile phones and PDAs.
2. It doesnt show all international visitors
What Magid talks about PV is just a status quo check metrics and it cant give any actionable insight.Which every website would die to find.
Besides, cookies problem can be negated by looking at total visitors and not UNIQUE visitors.
Most of the problems of server logs suggested by Magid suggests can be mitigated by javascript tagging by any web analytics vendor.(now Google analytics FREE)
Posted by: Ravi | Oct 11, 2006 4:32:26 AM
ravi
some good points in there. unfortunately that javascript tagging is the subject of a patent that netratings is litigating heavingly against the analytic vendors.
fred
Posted by: fred | Oct 11, 2006 5:19:05 AM
you have del.icio.us in the second graphic and delicious.com(that redirects to del.icio.us) in the first graphic
Posted by: danny | Oct 11, 2006 5:49:41 AM
danny,
i typed del.icio.us into comscore and it returns Delicious. they have obviously mapped the two domains into a single entity that they measure together and called delicious.com
fred
Posted by: fred | Oct 11, 2006 6:09:15 AM
General traffic statistics, as opposed to conversions, should go the way of "hits".
Our need to rely upon general traffic data is predicated on non-internet and non-context dependent advertising. Traffic statistics are proxies for conversions - but as we measure more accurately what we need to know to increse conversions, we will be less interested in these proxies.
Traditional sampling methods and statistical inferences from our traffic data will likely become less important than measuring effective increases in conversions which directly increase the return on investment.
Posted by: michael webster | Oct 11, 2006 7:16:51 AM
Interestingly, Comscore's CEO Magid Ambraham was a leader of a retail point of sale (POS) data company, IRI - there's an interesting parallel to review. After also working in the retail POS arena, I can draw a correlation for web analytics, and provide some added insight to the dilemna.
When Neilsen and IRI historically posted retail POS data to their clients using statical sample-projection methods, they had an ace in the hole. They had state and federally required information on total retail sales, so invariably, they could push and pull individual sales data by product and store around to ensure the totals matched. They would use panels to overlay user-details on retail sales to create a user-sales analysis that was an interesting add-in of 'color commentary' to the sales trends.
Over time, as Coke and Pepsi or WalMart and Target would match their own data against these 3rd party vendors, the truth showed the sample projection methods inaccruate even at a national level, let alone on an item or store. This lead the POS data industry to go after store-level data in as many cases as possible to collect all information. As individual retailers' internal data became more available at the store level, and as stores aggregated from regional to national, folks like WalMart found working with outside data vendors less valuable and became disinterested in fully providing their data to benefit other retailers, resulting in greatly reduced effectiveness of 3rd party POS reporting (and a the admission of the fact that national analytics for retail POS weren't as accurate as were lead on).
What does this mean for Comscore and web traffic sample-project methods? Problems of using appropriate sample selection methods combined with the necessary sample size to get proper per-site analytics, down to the week or day, is a massive undertaking and while they report an accurate portrait over longer periods (quarterly or annually), one should be wary of using them at such a detailed report as weekly or on smaller individual sites. As a parallel to Comscore's goals, even when combining user information with state/federal retail sales, statisticians in retail POS couldn't easily nail per item sales per week which is much the same frustration web analytic seekers are hunting for today.
Comscore appears to have a system in place that can provide very insightful information at a high level. Comscore's reporting [over longer time periods] combined with very strong in-house metrics are the making of a robust analytic approach that looks across the horizon and focuses on a solid return on marketing investment within a specific business.
Thanks for your insight and blog post.
Posted by: CoryS | Oct 11, 2006 9:14:31 AM
the audience measurement situation is even crazier when it comes to audio and video.
youtube says it is serving 100MM streams per day. ok, lets stipulate they are honest. but what is a "stream"? is it a server request?
no one would blame youtube for taking the most generous view of their own efficacy so i always assume that is what they mean by a "stream."
but an advertiser is extremely ill advised to accept that definition. the ratio of server requests to videos fully delivered -- never mind videos actually viewed! -- is a far far cry from 1:1. and i have never seen anyone attempt to quantify the delta between requested content, and consumed content.
at least in conventional TV, particularly with the advent of people-meters, the panel is reasonably assumed to be consuming the media. riddled as it is with latency issues, and browser and plug in compatibility issues, etc., you simply cant make that assumption with the Internet or mobile devices, and the like.
Posted by: steve | Oct 11, 2006 10:03:00 AM
I represent Compete, Inc. a competitive metrics firm that often faces the same questions as comScore, Hitwise, etc.
I want to avoid getting into who's right and who's wrong, but would rather focus on how Compete is improving the situation.
On the idea of 'triangulation' I could not agree more. In fact, Compete's methodology uses a combination of ISP data (similiar to Hitwise) and ASP data (a la the Compete Toolbar and other proprietary applications). By using both methods Compete is able to isolate biases within each data source and normalize accordingly. Compete's panel is currently over 2M domestic users, which is the largest domestic panel available.
To date, Compete has limited its information to clients, but on Nov. 1st we will be releasing a publicly available service called "SnapShot". The service will feature 13 month trend analyses on any and every site our member community has ever visited... Think Alexa on steriods.
In the interim, you can find single month Compete metrics through our toolbar which is still in Beta, but can be found at http://toolbar.compete.com
Posted by: TJ Mahony | Oct 11, 2006 12:41:30 PM
I find it very difficult to believe that Alexa skews towords a techie crowd.
Do you have any evidence to back that claim up?
I find it hard to believe that techies would choose to install the Alexa toolbar, given that they perceive it as spyware.
I'd find it more likely that Alexa would skew towards the non-techie crowd. Possibly a different non-techie crowd than Comscore. But still a non-techie crowd.
Given this, I do NOT believe that somehow averaging Alexa and Comscore data will give you truer data. (Which seemed to be an implication of the article.)
-- Charles
Posted by: Charles Iliya Krempeaux | Oct 11, 2006 2:28:57 PM
If you compare the panel-based sample sizes that traditional media (mediums) use, the web's tracking capabilities look like rocket science in comparison...a great/long article on this from NYT's Magazine is here:
http://www.nytimes.com/2005/04/10/magazine/10NIELSENS.html?ex=1270872000&en=e5ff850c2abb47b5&ei=5088
That aside, I would agree that the largest sample size of RAW data, warts and all, provides the most accurate measure. I too rely on Hitwise and will take their "directional" market share-based calculations vs. unique visitor counts that everyone discounts out-of-hand. The forthcoming-mentioned offering by Complete, Inc. indeed looks interesting as well.
And HERE HERE to all site analytics packages having a "checkbox opt-in" to contribute site data into a central repository...peer pressure could both (i) encourage greater transparency and (ii) reduce everyone's reliance on the "overhead" of audience measurement services (with all due respect to Fred's board seat at comScore :)).
Posted by: Chris | Oct 11, 2006 2:32:36 PM
Chris,
Sample size does not determine sample validity. Sample size is an important component, but sample bias is an equally important (usually more important component).
To create valid samples, panel companies have to ensure that their panel is randomly selected.
I studied Nielsen Media Research several years ago when they were purchased by VNU. They went throuh near-Herculian efforts to ensure random samples. Once they had selected targets to install the PeopleMeter, Neilsen would go through multiple attempts to convince the family to install the PeopleMeter. This usually involved sending someone out to the house...multiple times. As a result, they had a very high acceptance rate on their randomly selected targets - ensuring a random, representative panel.
As an example, if you created the largest sample online, but it was comprised of adults 45+, myspace wouldn't likely register in the top 20 sites.
Sample size is important, but bias is equally, if not more critical. Read more on both here:
http://www.naffziger.net/blog/index.php/2006/10/11/sample-size-vs-sample-bias/
Posted by: Dave Naffziger | Oct 11, 2006 5:41:47 PM
if you look at the alexa chart for delicious, you'll see a huge jump up -- basically a vertical line -- somewhere in late march/early april of 2006.
you can see the same jump up for _lots_ of "web 2.0" type services (boing boing, techcrunch, etc).
I've read (don't have the reference handy) that Digg users bombed Alexa around that time, downloading and installing the client in large numbers. They skewed Alexa's rankings towards the techie then, and they've been doing it ever since.
Anyone know about this from within Digg or Alexa?
Posted by: Nicholas | Oct 11, 2006 8:42:50 PM
Allow me to join this lively debate and represent comScore’s viewpoint. We clearly have a lot of good and passionate ideas here. I did, however, want to discuss a few of the issues raised in this dialogue:
1) ‘A sample size of 100,000 users is too small’
Think about opinion polls. They use 1,000 respondents and they are usually fairly accurate, certainly within 10 percentage points. Arbitron measures local market radio with 600-1000 users. Nielsen measures national TV audiences with 8,000 HHs. IRI measures consumer usage in CPG with 70,000 HH’s. By comparison, 100,000 users (or larger in the case of comScore) is ample particularly for measuring a web site’s audience with 1% or more reach. After some point, sample size does not matter. This is fortunate because most of us would rather give a small blood sample for a blood test rather than donate our entire blood mass for a more accurate measurement!
2) ‘Data from ISP logs is better’
Some services such as Hitwise and Compete.com collect data from logs provided by a few participating ISP’s. This method has the advantage of a higher sample size, but is it better quality? Consider that this data does not include subscribers from AOL, MSN, Verizon, AT&T, Comcast, RoadRunner, Bellsouth, Cox, or any of the large cable companies. Think about it. None of these large corporations would ever sell their subscriber data to a research supplier in return for a few thousands dollars, and risk getting into a class action lawsuit for privacy violation. Only a few weeks ago, AOL got into trouble for releasing anonymous search terms for research purposes. Their CTO had to resign as a result
In addition, most serious companies do not buy their Internet access from retail providers such as Verizon or Netzero. This means ISP data would not include usage from work locations, with the possible exception of very small businesses. I don’t need to tell this crowd how important @work usage is.
Finally, ISP data does not have user demographics. Household demos could be approximated from zip code level census data. That is obviously far from ideal.
So we are left with data representing users from second tier ISP’s who skew to dial-up, mostly residential, missing most of the business segment, and having fuzzy demographics.
Is this still a ‘far better methodology’?
3) ‘Aggregating logs from web analytics systems is an ideal solution’
Here again, a census of clickstream, if you can get it, is definitely better than a sample. Even assuming that we can get all major websites to cooperate, the storage needed to house the clickstream data of all US users for a year would be at least 19,000 Terabytes in highly compressed form, Multiply that by 5 to get global usage. You would keep Dell busy for a while manufacturing enough servers to process this mass of data, assuming you can find a DBMS that can handle such a database. You could get by with less data volume by storing only aggregate data but you lose the individual surfing dynamics you need for acquisition, conversion, switching and audience overlap analyses. You will also have to settle for limited or no demographic data. Finally, the cost of doing all this would massively dwarf the ‘overhead’ of audience measurement services
Let me assure you guys that comScore is always looking for ways to improve the accuracy of our estimates. Many alternatives look tempting but break down when you look deeper. All in all, what we provide is not perfect but is simply the best we know how to viably do. But I for one, will keep looking and would have no shame trying somebody’s better idea!
Posted by: Magid Abraham | Oct 12, 2006 12:58:18 AM
I'm surprised by the comScore reps defending their model here when their tracking stats come from the spread of what many sites have labelled Malware. Alexa isn't all that flash, but comScore is even worse, they prey on the computer illiterate. I wrote about it here if you're interested.
http://www.thebloggingtimes.com/opinion/index.php/2006/09/12/blogs-are-a-statisticians-worst-nightmare/
Posted by: Duncan | Oct 12, 2006 3:07:09 AM
Alexa isn't skewed to technie crowd.
Alexa runs only in IE.
Techie crowd doesn't use IE.
Digg and Alexa jump are, I believe, not related. The blog you read that on was wrong, a month off. Look at the comments of that post, I made that point back then.
Also, see this pie chart that breaks down Humans, Aggregators, and Bot traffic for Simpy.
Posted by: Otis Gospodnetic | Oct 12, 2006 3:20:00 AM
Oh, and I wrote something about this the other day:
http://blog.simpy.com/blojsom/blog/2006/10/04/TechCrunch-vs-Lifehacker-Fishy-Stats.html
It looks at TechCrunch and LifeHacker, and compares their SiteStats and Alexa numbers. The numbers do not match at all!
Whom to trust!?
Posted by: Otis Gospodnetic | Oct 12, 2006 3:24:11 AM
ducan
Comscore's tracking agent isn't malware.
first users opt in to it
second, it doesn nothing to harm the user, no popups, no spam, no viruses.
comscore has been providing high quality internet measurement servicers for the past 6 years and everyone is better off because of the transparency they provide.
as magid said above, they aren't perfect, but neither is anyone else.
fred
Posted by: fred | Oct 12, 2006 7:29:14 AM
Allow me to share some comments from Nick Denton of Gawker Media who I discussed this with after further illustrating Fred's point here. For those who think Alexa doesn't skew to the tech crowd, you are dead wrong. (Digg example above is a great one.) To paraphrase Nick: techies use Alexa more than civilians, and Valley techies, most of whom are trying to inflate the numbers of their proxy web 2.0 startups, use it even more than common-garden techies.
He also noted that you can't even trust Alexa when making comparisons with sites that have similar oriented audiences. Here's an example.
I personally think the big monkeywrench in all of this is feed readership, especially among the tech blogs where I imagine in some cases as much as 80% of their audience uses a feed reader. Do stats packages like Sitemeter or any third-party measurement service come close to accounting for this? I really don't know and I am very curious.
Posted by: krucoff | Oct 12, 2006 11:58:49 AM
Your analysis is right on: Alexa skews towards the techie crowd; though clearly comScore, NNR are not 100% right at all.
All of these are good for relative analysis.
I think there is a major opportunity for - guess who - Google through their Uchin (Analytics) acquisition, to see why and how:
http://www.watchmojo.com/web/blog/?p=241
Posted by: ashkan karbasfrooshan | Oct 13, 2006 3:22:13 PM
Third party services are always going to face difficulties in measuring audience and usage. I'm not going to comment on the accuracy or lack thereof of comScore. Without running a side by side comparison it is hard to say, so they get the benefit of the doubt. Results can be considered reasonable if multiple third parties get statistically similar numbers using different methodologies.
The power of Google and Yahoo to measure audience is likely to quickly outstrip what third parties can do. For the simple reason that they have access to multiple methods of measurement that can be combined into fairly comprehensive metrics. Demographics are not hard either as both companies allow users to enter location and age etc into their identity systems.
I have been wondering how third party measurement agencies will survive over the next 18 months to 3 years as the GAMYE companies build up their audience measurement side of the business. My guess it only becomes threatening if they release demographic tied data to the wider community.
Finally, I would like to address the final point of Magid. The fact that it would require 19,000 terabytes or 19 petabytes is not really a hurdle at all. One current calculation (http://glinden.blogspot.com/2006/06/four-petabytes-in-memory.html)
shows Google has about 4 petabytes of RAM in its cluster. Giving this is RAM only, it is not heard to believe that Google (and this applies to the other GAYME compaies as well) has 10 times that in disk space particularly as the build out the new server farms. Besides, I doubt a GAYME company would store the raw data but process it in blocks and store the processed information.
Posted by: Simon Cast | Nov 1, 2006 8:00:15 AM
Terrific post. You're right that existing measurement panels are too small, biased, or too easily spoofed. The fact is, no panel will ever accurately measure small sites.
Quantcast provides free direct measurement of website audiences using a measurement pixel. We also provide a panel-based estimate for sites without pixels.
Check it out:
http://quantcast.com/techdirt.com
http://quantcast.com/savvy.com
http://quantcast.com/pianoworld.com
Posted by: Paul Sutter | Nov 9, 2006 6:39:51 PM
I plan to do a quick write-up about the FREE services provided by comScore, Nielsen, HitWise, Alexa, Compete, and Quantcast.
Cheers,
-Brian
Posted by: Brian Berliner | Dec 21, 2006 7:32:27 PM
A VC

