User:Stu/comScore data on Wikimedia
|This page is kept for historical interest. Any policies mentioned may be obsolete. If you want to revive the topic, you can use the talk page or start a discussion on the community forum.|
One of the major online audience measurement companies, comScore, Inc., has generously donated access to its Media Metrix and World Metrix data sets to the Foundation. comScore has an opt-in panel of two million internet users around the globe and uses a range of statistical techniques to create an internally consistent portrait of the global internet audience. As an example, below is a chart summarizing comScore's estimated audience over the past few years for wikipedia.org, both in the U.S. and worldwide.
January 2010 data
comScore estimates that, during the month of January 2010, 365 million unique visitors (UVs) viewed our projects from a personal computer, which it estimates was a "reach" of 29.5% of the 1.24 billion worldwide PC-based web browser audience:
|Worldwide unique visitors|
|Google Sites (includes YouTube)||920 million|
|Microsoft Sites||734 million|
|Yahoo! Sites||599 million|
|Wikimedia Foundation Sites||365 million|
|AOL LLC||267 million|
|Amazon Sites||245 million|
|Ask Network||228 million|
|CBS Interactive (includes CNET)||212 million|
comScore estimates our audience in different regions, and also estimates what percentages of the audience within each region visited one of our sites:
|Unique visitors||Reach in region|
|Asia Pacific||89.9 million||17.4%|
|North America||82.5 million||41.1%|
|Latin America||32.8 million||33.5%|
|Middle East - Africa||23.4 million||27.2%|
comScore estimates visitors to the different language versions of Wikipedia and estimates the unique visitors worldwide:
|Worldwide unique visitors|
|English Wikipedia||189.9 million|
|Japanese Wikipedia||35.1 million|
|Spanish Wikipedia||32.0 million|
|French Wikipedia||26.4 million|
|German Wikipedia||25.1 million|
|Russian Wikipedia||13.9 million|
|Portuguese Wikipedia||11.1 million|
|Italian Wikipedia||10.5 million|
|Arabic Wikipedia||8.6 million|
|Chinese language wikipedias||6.3 million|
|Vietnamese Wikipedia||4.7 million|
|Korean Wikipedia||2.5 million|
|Indian language wikipedias||.6 million|
|Javanese Wikipedia||.02 million|
comScore's panelists report age and sex so it can generate detailed demographic estimates, including raw data and also an index which measures the extent to which a set of visitors to our sites is over or under-represented compared to visitors to all sites on the internet. For January, comScore estimates our 365 million audience is made up of 199 million men (29.7% of men online) and 166 million women (29.2% of women online). We index slightly higher with men (101) than with women (99). Here's a breakdown of different age groups:
|Worldwide unique visitors||Reach in
|Ages 15-24||98.6 million||29.2%||99|
|Ages 25-34||85.3 million||26.4%||89|
|Ages 35-44||76.5 million||28.7%||97|
|Ages 45-54||58.4 million||33.4%||113|
|Ages 55+||46.0 million||34.0%||115|
We index highest for older users (ages 45-54 and 55+) and lowest for those 25-34 years old. I dug into this issue, at first thinking it was driven by twenty-something preference for YouTube and Facebook. As far as I can tell, though, our comparatively weak performance in the 25-34 year old demographic is the result of our weakness in China where comScore believes there is a huge audience in that age range. For example, the large China-based sites like Tencent, Baidu, SINA and Alibaba all index globally around 120 for this demographic group.
comScore estimates our audience by project:
|Worldwide unique visitors|
|Wikimedia Commons||4.5 million|
China, India trends
comScore estimates the unique visitors to our sites from home and office users in China (excluding Taiwan and Hong Kong). In July of 2008, comScore estimated 232,000 UVs to our sites in China. In August, the month of the 2008 Beijing Olympics, comScore estimates we had 1.3 million visitors. By March of 2009, the audience estimate was 2.75 million. In January, comScore estimates 3.4 million visitors, comprised of 2.5 million UVs to one of the Chinese language wikipedias and 1.0 million to the English Wikipedia. By contrast, comScore estimates the Baidu Encyclopedia had 47 million visitors from within China. Given that comScore does not track internet usage from public locations (e.g. internet cafes), these estimates certainly undercount overall activity from China.
In India, comScore estimates 10.1 million unique visitors came to our sites, or 27% of internet users in India. Of these, 9.9 million visited the English Wikipedia while 320,000 visited one of the different Indian language wikipedias.
Source of traffic
comScore also provides analysis of the site a user surfs just prior to visiting us. The percentage of these "entries" from Google and other search engines is often used as an indicator of reliance on the search engines for traffic. Other major sites like YouTube, eBay or Facebook typically see entries from Google at 10% to 15% of their traffic while we are typically over 50%. Here's a breakdown for us for December (this data is published later than other information so is a month behind):
|Entries||% of total entries|
|Google Sites (includes YouTube)||1,686 million||60.1%|
|Yahoo! Sites||159 million||5.7%|
|Microsoft Sites||105 million||3.7%|
comScore gives data on unique visitors, and Erik Zachte calculates the number of logged-in users who have made more than five edits in a month which provides a reasonably good metric for active participation. With data coming from two different data sources it's a bit apples-and-oranges, but is still useful.
The table below shows the calculations for the biggest few Wikipedias for December, the latest month with available editor counts. On the English Wikipedia only about .02% of the unique visitors actively edit. Put another way, that's one-fifth of one-tenth of one percent. If you include all logged-in users who made at least one edit, it's about fifteen times higher at one-third of one percent. Including anonymous editors would result in even higher participation rates but to date Erik has not been able to analyze anonymous editing.
|Dec '09 UVs from comScore||Dec '09 editors with >5 edits||% of UVs
with >5 edits
|Source: Unique Visitor estimates from comScore, editor stats from http://stats.wikimedia.org/EN/TablesWikipediaZZ.htm|
Analysis from earlier months
Discussion of comScore & Wikimedia
Jay Walsh on the Foundation's staff is managing the comScore relationship overall, Erik Zachte is helping drive the statistical analysis, and a volunteer named Josh Holman has a lot of experience with comScore data. Feel free to reach out to me or any of them with questions. If there's interest, we'll try to update this page every month or two as new data comes out.
Finally a quick thank you to comScore. The data they donated typically sells for thousands and thousands of dollars, so we're lucky to be able to review. Speaking on behalf of all of us in the community, I want to thank them for their support.
- comScore has a large and professional team dedicated to audience measurement. We are able to benefit from their insights with no coding, no servers, no hard drives stuffed full of log data, and almost no effort.
- comScore reports "unique visitors", which estimates the actual people using the internet. This puts things into more human terms than page views or an ambiguous "traffic rank" metric. Also, comScore works hard to exclude bots, crawlers, mirrors, click farms, etc.
- comScore works to combine different domains and subdomains. This is particularly useful for international properties. For example, we are able to generate a single audience number for all five or so Chinese language Wikipedias and compare that, both worldwide and within China, to the audience using the English Wikipedia.
- Because comScore does its analysis consistently for all websites, with the same statistical techniques and methodology, we can compare among our projects and to others.
- With a panel of two million users globally, comScore has strong international coverage.
- comScore panelists provide demographic data so we can see estimates of factors like age and sex.
- Coverage of educational users -- comScore focuses on users 15 years old and older using the internet at home or work. Globally, it does not have coverage in schools (though it does have coverage in universities in a few countries). Given our strengths in education this will inevitably lead to significant underreporting of school use and thus our overall audience.
- Coverage of worldwide usage -- comScore recruits a panel of users across the world, but their coverage can't be perfect. Given our strong international presence, this will likely also lead to some misreporting of our audience. Also, the dynamics of their panel make analysis less and less valuable the deeper you drill down. Statistics for a specific smaller countries (e.g. Egypt) are typically not available or if they are might be less useful depending on the size and make-up of comScore's panel there.
- Coverage outside home/work -- comScore does not measure people who go online from an internet café or other public/shared computers. This means their audience estimates in certain parts of the world will be significantly underreported. This will have a major impact of underreporting total audience in countries with strong public/shared internet usage. Whether this has a big impact in percentage reach numbers depends on differences in home/work usage and public/shared usage (which might be meaningful in some countries where governments are believed to trace people's internet usage).
- Coverage of the mobile audience -- This data set of comScore's is of the PC-based internet audience, so excludes access through mobile phones. A July 2008 Nielsen research report estimated there were about 40 million mobile web users in the U.S. alone, and most industry observers expect this number to grow rapidly. This is likely another source of underreporting of the total audience we reach.
comScore offers a sophisticated ability to combine domains and subdomains to better understand the audience for and performance of our projects. We've worked extensively with them clean up their definition. We want to be inclusive and careful in defining the different Media titles ([M]), Channels ([C]), and Subchannels ([S]) so we can see what's happening with the different projects. Also, we don't need to be exhaustive and capture every single domain name. A domain which automatically redirects to one of our other sites would end up being counted after the redirect. If you see other changes we should request, start a thread on the Talk page.
Below is comScore's definition as of January 2010, which includes some still experimental efforts to identify edit pages:
[P] Wikimedia Foundation Sites
[M]: WIKIBOOKS.ORG (p)
[C]: Wikibooks Edit Pages
[M]: Wikimedia Commons
[C]: Wikimedia Commons Edit Pages
[M]: Wikimedia Community Sites
[C]: WIKIMEDIA.ORG* (p)
[S]: Wikimedia Meta-Wiki Edit Pages
[C]: WIKIMEDIAFOUNDATION.ORG (p)
[C]: Wikinews Edit Pages
[M]: Wikipedia International Portals
[M]: WIKIPEDIA.ORG (p)
[C]: Arabic Wikipedia
[C]: Chinese Wikipedias
[C]: English Wikipedia
[C]: French Wikipedia
[C]: German Wikipedia
[C]: Indian Wikipedias
[C]: Italian Wikipedia
[C]: Japanese Wikipedia
[C]: Javanese Wikipedia
[C]: Korean Wikipedia
[C]: Portugese Wikipedia
[C]: Russian Wikipedia
[C]: Spanish Wikipedia
[C]: Vietnamese Wikipedia
[C]: Wikipedia Edit Pages
[S]: English Wikipedia Edit Pages
[C]: Wikipedia.org Homepage
[M]: WIKIQUOTE.ORG (p)
[C]: Wikiquote Edit Pages
[M]: WIKISOURCE.ORG (p)
[C]: Wikisource Edit Pages
[C]: Wikispecies Edit Pages
[M]: WIKIVERSITY.ORG (p)
[C]: Wikiversity Edit Pages
[M]: WIKTIONARY.ORG (p)
[C]: Wiktionary Edit Pages