Talk:Edits by project and country of origin

From Meta, a Wikimedia project coordination wiki

General[edit]

Is this all edits, ever, or over a recent specified period?

Really cool data! Thanks for collecting it :) --pfctdayelise 11:24, 4 September 2006 (UTC)[reply]

Recent changes data only, so it's past 30 days or so. Kelly Martin 13:56, 4 September 2006 (UTC)[reply]
Very neat! By sheer coincidence I discussed this same idea with nl:RonaldB who has a private database of IP to country data, and did this for NL only. [1] I would like to add a similar chart to Wikistats later this year, possibly with history as well. Which IP to country convertor would you recommend? Did you parse the archive dump? If so how did you translate user names back to IP? Erik Zachte 21:53, 4 September 2006 (UTC)[reply]
It came out of the Recent Changes data, which stores IPs for all edits, logged in or not. Alphax 03:10, 5 September 2006 (UTC)[reply]
Apologies for delayed reaction, but just back from holidays.
RC input might not be representative for what is happening on the long run. It may be heavily influenced by occasional bot actions, a conclusion I've drawn from graphs representing the article growth on some major languages. On a logarithmic scale that curve is almost a straight line (so growth is exponential), but looking to the first derivative (i.e. giving you the new articles per day) you may observe strong variations. See e.g. Slide 10, (in Dutch, but it is a graph); a newer version will be uploaded soon.
The link given by Erik Zachte, shows for nl: the distribution based on IP's which have ever done some editing. Admit that is also not 100% representative for where edits are coming from. But the effect of bots is eliminated and imho the number of IP's taken into account is large enough to think that the result is more reliable.
The cross-check I have done is looking to the activity per capita for IP's in the Netherlands and Belgium (Flemish speaking part). Order of magnitude is the same (ratio for BE is 75% of ratio for NL), giving me quite some confidence in my analysis approach. - RonaldB 22:51, 22 September 2006 (UTC)[reply]

Springtime for Dutch and Belgium[edit]

"dawiki (0.5%): DK: 54.8%, NL: 26.7%, BE: 6.0%, all others: 12.5%" This one is suspect - it seems unlikely that 26% of edits to the Danish Wikipedia should come from NL. Thue 06:40, 6 September 2006 (UTC)[reply]

The edit number from NL seems to be oversized for all wikis. NL is present on all wikipedias and even in Korean and Hebrew there are more than 10% Dutch edits. Something must be wrong. --::Slomox:: >< 11:12, 6 September 2006 (UTC)[reply]
I guess it may be an interwiki bot. I'd recommend to filter the bots out whenever possible... Dr Bug (Vladimir V. Medeyko) 11:39, 6 September 2006 (UTC)[reply]
Bots ... bots ...... Aph.
Another cause might be the granularity of the geolocation database. The one I'm using has > 82.700 records.
I don't think the interwiki bots is the cause. There are not that many Dutch people capable to intelligently run an interwiki bot on the Korean wikipedia.
If you e-mail me an ascii file of IP's for a certain language (e.g. nl:), I could query that against my geodatabase and compare the result with yours. - RonaldB 00:10, 23 September 2006 (UTC)[reply]
The toolserver is located in nl, this is the reason it is present everywhere. TS IP's must be excluded from the stat. Mashiah Davidson 06:25, 4 January 2008 (UTC)[reply]
Another cause could be the multilingualism in the Netherlands. At highscool we get taught Dutch, German, English and French. Millions of Dutch have also been on vacation many times to Spain, Portugal and Italy and hundreds of thousands have learnt one of those languages. Over 100.000 Dutch now also work in Poland. This, combined with over 1.2 million immigrants from all over the world and the highest Internet penetration of any country makes The Netherlands a good participant country in the Wikipedia project. But yes, the bot helps too! Ihsan82.171.81.187 13:40, 12 June 2009 (UTC)[reply]
I guess(!) it may have something to do with Wikimedia's servers in Amsterdam? Maybe they count accesses from these server on the US-servers as an edit coming from the Netherlands? -- 217.157.175.46 00:22, 8 November 2009 (UTC)[reply]
In any case the page was last updated 15 months ago, so I wouldn't bother too much. Wutsje 00:35, 8 November 2009 (UTC)[reply]

Charts[edit]

I'm going to generate and upload some charts of these to give people a visual representation of what's going on here. Unfortunately I can only generate PNG charts, so if someone knows how to do them as SVGs, please help! Alphax 13:24, 4 September 2006 (UTC)[reply]

Use EasyTimeline!!!! pfctdayelise 16:20, 4 September 2006 (UTC)[reply]
EasyTimeline already always generates png + svg (hidden feature) just get the image url and replace extension png with svg. Since Mediawiki always translates back to png and clickability gets lost in the conversion there is not much point to do that now, except for publishing offline. Erik Zachte 21:43, 4 September 2006 (UTC)[reply]

Hm OK it needs some work... like maybe they should all be on the same line :) You get the idea, though. pfctdayelise 16:38, 4 September 2006 (UTC)[reply]


Sorry, I meant "pie chart"... w:KChart can do them, but I'll need the data in a suitable format. Alphax 03:06, 5 September 2006 (UTC)[reply]
Ok, pie charts are aparantly not good for this sort of thing. Here's one I did in Excel:
Not great, is it? Alphax 03:23, 5 September 2006 (UTC)[reply]
Actually pie charts are more suited to this than bar charts, because the total is fixed (100%). But I just hate the look of Excel charts so much, and love EasyTimeline so much. :) pfctdayelise 12:28, 5 September 2006 (UTC)[reply]
Funny, according to a link off w:Pie chart ([2]), pie charts are not good for presenting information, despite the fact that they look pretty. Alphax 15:34, 5 September 2006 (UTC)[reply]

Using EasyTimeline[edit]

Here's an example of what I want... Alphax 06:20, 5 September 2006 (UTC)[reply]

You could add percentages, see US, EN above. I also added an alternative position for the wiki name. It would help to keep charts concise when many of them are stacked on one huge page. This way there is also less surrounding empty space when a chart gets included on another page. Erik Zachte 10:08, 5 September 2006 (UTC)[reply]

Ooh, I like it. Alphax 15:34, 5 September 2006 (UTC)[reply]

Will all charts use the same horizontal scale? It would help to compare them visually. Erik Zachte 10:10, 5 September 2006 (UTC)[reply]

I imagine so, but then you get... well, they all have to go from 0 to 1000. Have a look at the chart above. Alphax 15:34, 5 September 2006 (UTC)[reply]
You could base the image width on the max value. So charts have different widths but same width per 100 points on the scale. e.g. ImageWidth = 60 (left + right) + maxvalue ->
maxvalue = 600 -> width = 660
maxvalue = 800 -> width = 860
Ok, I'll compare:

I generated some charty goodness: /charts It worked OK but some need a tiny bit of tweaking. pfctdayelise 07:03, 6 September 2006 (UTC)[reply]

Thats nice for a start. Perhaps ImageWidth = 60+maxvalue*2/3 is better. You could use Imagesize = height:auto barincrement:25 and thus all bars will stay equal width and equally spaced and image and plot size adapt to number of bars. Erik Zachte 09:21, 6 September 2006 (UTC)[reply]
Um...but then won't ImageWidth likely be less than maxvalue? So the bar won't even fit on the image...? pfctdayelise 09:37, 6 September 2006 (UTC)[reply]
No because the bar length is relative to the image size, in the next chart I only changed ImageSize width:400
Oh and a larger right margin PlotArea right:35, because the 76.2% text will fall off the chart when a bar fits too tightly (uses whole scale).
Could you also please add comment '#edit statistics' e.g. somewhere at the end? (see last example chart) I can use that as filter criterium in wikistats, 200 similar charts in the timelines overview would be a bit crowded. I also filter other repetitive bar charts. The idea is that people browse the overview to understand what can be done with EasyTimeline.

Observations[edit]

  • Is there no data for mainland China, or did I miss it...?
  • Not sure if IE is Ireland, but if it is, interesting that there are more edits to NL than to GA (Irish) - they don't even make enough to warrant their own listing
  • Likewise if PH is Philippines, we have a TL (Tagalog/Filipino) wiki but it doesn't even rate

--pfctdayelise 07:35, 6 September 2006 (UTC)[reply]

I talked with Kelly and she said:
China wasn't listed at the first version, since less than 10K edits were submitted to the counted projects (around 8000 edits).
Supposedly "Mainland China Blocking" affected this result, Aphaia guesses ;(
IE is Ireland.
PH is Philippines. Supposedly there might be similar phenomena of Indian language projects (Indian editors tend to prefer English Wikipedia), which is being discussed on foundation-l. --Aphaia 10:24, 6 September 2006 (UTC)[reply]

The sampling did not include gawiki or tlwiki, amongst countless other smaller projects. I'm hoping Greg will get me a full sampling of all projects soon, and we'll update with broader information at that time. I am very interested in seeing the participation rates on the smaller projects, although our reporting limits will reduce the amount of information that we can share publicly.

CN didn't make the 10,000 edit cut for display. However, if you look at zhwiki, CN does show up in the percentage breakout. Kelly Martin 05:57, 7 September 2006 (UTC)[reply]

bn-wiki[edit]

Just a small, but significant comment, Bengali (bn) is more correctly termed as a Bangladeshi language rather than an "Indian" langauge. Look at the statistics ... 60%+ of the edits are from Bangladesh. So, it is misleading to keep calling bn-wiki as a "Indian language" wiki, considering that 75%+ of the speakers live in Bangladesh. Thanks. --128.174.253.123 20:23, 7 September 2006 (UTC)[reply]

Bengali is an Indian language also used in Bangladesh. --59.92.50.34 01:24, 20 November 2006 (UTC)[reply]

If 75%+ of the speakers live in Bangledesh, then I think it counts as a Bangladeshi language also spoken in India. 189.161.85.253 05:20, 6 November 2008 (UTC)[reply]

Please expand[edit]

I'd especially be interested in the 15.8% "others" of enwiki, statistics about trwiki, fawiki, arwiki and idwiki respectively the countries TR, ID, EG, PK, IR and SA. Thank you very much. br --62.116.76.117 02:33, 15 August 2007 (UTC)[reply]

Country names[edit]

Could I just ask why the countries are only described in two letters? Many people will not know what country is meant by a certain two letter code, so could we change it? Or is there a good reason? Thanks. 90.195.182.168 17:21, 24 May 2008 (UTC)[reply]

Please cooperate for updating[edit]

en:Wikipedia_talk:Edits_by_project_and_country_of_origin

en:Wikipedia:Edits_by_project_and_country_of_origin

--FaktneviM 10:38, 27 June 2011 (UTC)[reply]

for china[edit]

Why at the counterpart of the page at english wikipedia, in the "by country" part, the row China does not have Chinese?C933103 (talk) 09:11, 2 March 2013 (UTC)[reply]