Research:Wikinews Content Import Analysis

From Meta, a Wikimedia project coordination wiki
Contact
Laura Hale
This page documents a completed research project.


Project Summary[edit]

One of the current goals of members of The Wikinewsie Group is to increase participation on all projects. Three members of the group’s provisional board are active on English Wikinews. Behind the scenes, the English Wikinews community also have this as a goal. One of the perennial solutions suggested by outsiders to the project is to increase community participation and content by importing content from other similarly licensed news projects to English Wikinews. This essay seeks to look at a case study from another language Wikinews project to determine the impact on overall content creation and community participation to determine if the case study of external content import offers a path towards community content creation and new contributor recruitment.[1]


In 2009, Serbian Wikinews as a project, led by Millosh, created a bot that imported content from similarly licensed Serbian language news sources with the content being published on the project.[2] The import bot, and subsequent surpassing of English Wikinews in terms of content production, was a point of pride for the project with the news documented in Balkan Insight.[3] At the same time, English Wikinews was also debating doing something similar, but this was ultimately rejected by the community because the articles would not meet the project’s guidelines.[4] Since then, the two projects have diverged greatly in terms of policies.


What has the impact been on both communities in terms of content creation rates before and after October 2009, when Serbian Wikinews started importing content from other news sources? For perspective, Serbian Wikinews was created in July 2005 while English Wikinews was created in November 2004.[5] The importing bot on Serbian Wikinews was active until April 3, 2013 when it was blocked by an administrator citing the contributors as an undesirable way to write news.[6] All comparisons between the two projects will be based on the start date for Serbian Wikinews unless otherwise stated. [7] The purpose of this analysis is to understand what happened on Serbian Wikinews in the context of content import, to see how this impacted contributor participation, overall content creation and article traffic. Once understood, this can serve as a potential guide for implementation or non-implementation of similar solutions for increasing content and participation.


One of the first ways to measure the raw impact of the content import on Serbian Wikinews is to examine the slope for article creation before and after the import. If the import was successful, the slope of the line for article creation as a factor of time would be steeper. Using the average number of articles published per day in a month, the slope of the line for the period between May 2005 and June 2009 was 1.05 for Serbian Wikinews and -4.02 for English Wikinews. For the period between October 2009 and March 2013, the slope of the line is -0.29 for Serbian Wikinews and -4.57 for English Wikinews. The growth on Serbian Wikinews increased steadily from its founding until the period where they added the importing bot. Following that period, the project saw a decline in growth. This contrasts to English Wikinews which saw a decline in growth over the whole period.

This is visualized in the graph above which shows the average daily article production for both projects. English Wikinews has been incrementally declining in total daily articles produced. Serbian Wikinews in comparison was slowly increasing its daily production prior to the external content production. Since that time, community production has decreased and rapidly. As one data point, this appears to suggest that the content import potentially is a net negative for community content creation.


Another potential way to determine the impact of the content importing is to look at the slope of the line for editors who have ever contributed to the project before and after the content import. This gives a perspective regarding the ability to attract new contributors. Using the total number of registered users who have ever contributed by month[8], the slope of the line for the period between May 2005 and June 2009 was 3.1451 for Serbian Wikinews and 0.0460 for English Wikinews. For the period between October 2009 and March 2013, the slope of the line is -2.0084 for Serbian Wikinews and 0.1814 for English Wikinews. When the bot import activity was halved, the period between October 2010 and March 2013 still does not show pre-content import levels of attracting new contributors[9] with a slope of 2.211. The rate of historical contributor growth can be seen on the graph below. Serbian Wikinews’s rate of attracting new contributors was worse after the content importing. The three month period where the bot was turned off, the slope is 1.5, which suggests that turning it off alone did not assist in attracting new contributors.



Another way of looking at contributors is to look at the total number of active editors contributing to the main space, where articles are published, on a monthly basis. The slope can be calculated to see the relative increase or decrease based on that. The slope of the line for the period between May 2005 and June 2009 was 1.2970 for Serbian Wikinews and -0.1246 for English Wikinews. For the period between October 2009 and March 2013, the slope of the line is -1.4319 for Serbian Wikinews and -0.1108 for English Wikinews. Prior to the introduction of the news importer, Serbian Wikinews had more participants as it had more articles. In the period prior to the introduction of the imported content, Serbian Wikinews saw a small increase in the number of active contributors on a monthly basis. Following the introduction of imported content, they saw a decline of contributors at the same rate.


Bot contributions could possibly be seen as facilitating the ease of human participation on a project, with the content import on Serbian Wikinews largely being done by a bot. The correlation for the total number of human versus bot edits to the main space was calculated to confirm the possibility that this might be happening.[10] This data is not available for all months. This lack of monthly data accounts for using different time periods than other data. For the period between July 2005 and April 2009, the correlation between total bot edits and human edits to the main space was 0.249. The correlation for the period between October 2009 and April 2013, the period when content import was active, was 0.467 which suggests a small correlation between increased bot edits and increased human edits. The correlation between bot edits and human edits from October 2010 and April 2013 when bot imports was halved was -0.149, which suggests a completely random relationship between human contributors to the main space and bot edits to the main space. This suggests overall there is no conclusive link that bot-related contributors impact human-related contributors to a project.


One possible argument is that a community is not needed and human production of original content is not required. The goal of projects is to freely share knowledge: if this is knowledge that has been previously published on another website and has a compatible license with the Wikinews project and it can get traffic, a community is not necessarily a requirement for the project to function. Traffic and imported content can sustain the project.


With this thinking in mind, Serbian Wikinews´s model of content importing could be argued as successful if it generated increased amounts of traffic compared to periods when the import bot was not active and production rates were lower. Using monthly traffic totals for Serbian Wikinews[11] and comparing that against daily production rates by month, the following graph is generated.


Serbian Wikinews had a surge in traffic in the initial period before the implementation of the content importer following all time traffic lows. Following this, the graph suggests that this content import led to a rise in traffic, but this was not sustained. In fact, a large spike in traffic occurred after the content import was halved.[12] Traffic in a number of periods actually appears higher than when the content import was most active.[13] Observational data is, to a degree, also supported by correlation. The correlation between the article traffic and total articles created by day from June 2008 to June 2009, the period before content import, was 0.72. This number suggests a strong correlation between article production and traffic totals. In the period between October 2009 and March 2013 when the import bot was active, the correlation was 0.17. This suggests traffic relative to article production was close to random. It is supported when the correlation is found for the period between October 2009 and October 2010, when the import bot was most active. In that period, the correlation was -0.04, which suggests almost true randomness. In the period between October 2010 and March 2013, when the bot activity was halved, the correlation was -0.47. This suggests that the greater the number of articles, the less traffic Serbian Wikinews had. Serbian Wikinews did not benefit from increased traffic during periods of increased article creation as a result of content import.[14]


Serbian Wikinews has the largest archive of published news stories of any news project. As of May 2013, their 75,000 articles account for 37% of all content across all Wikinews projects. The next closest language project in terms of content size is Polish Wikinews with about 25,000 articles and English Wikinews with nearly 20,000 articles. The archived material could be perceived as being useful by the wider community as a large archive of historical news material. To determine this, the total monthly page views was divided by the total number of articles on the project to determine the relative access levels to these news stories as a historical archive. For Serbian Wikinews, prior to the introduction of the content importer, the project had an average of between 20 and 80 views per article.[15] Following the introduction of the content importing, the average monthly page views per article drops to less than ten. As the graph below shows, this pattern of per article drop off is consistent across English, Spanish and Polish Wikinews , though for projects other than Serbian the percentage drop is less.


The total average article views per month dropped for Serbian Wikinews and it appears the project is not being used as a resource by Serbian speakers to view historical news stories.[16]

On the whole, the data suggests that Serbian Wikinews did not benefit from an increase in contributor written news stories, in creating a larger editing community, or an increase in traffic as a result of new story content import from other news reporting sites. This appears to be something that the Serbian Wikinews community has recognised as problematic when the importer was blocked from the community in April 2013. If other Wikinews projects[17] are considering content import, they should consider the lessons from Serbian Wikinews to see if the outcomes achieved by the import match with the project’s own goals.


Notes
  1. While editor retention would be ideal to study, editor retention is much harder to address without looking at the individual history of contributors. The number of Serbian Wikinews contributors is small enough to make this feasible. Some information can be gleaned by looking at http://stats.wikimedia.org/wikinews/EN/TablesWikipediaSR.htm and this is an area where further analysis may be useful.
  2. Details of this are documented on English Wikinews at Wikinews:Water_cooler/miscellaneous/archives/2009/October.
  3. http://www.balkaninsight.com/en/article/serbian-wikinews-first-in-number-of-articles
  4. Supporters at the time included Juliancolton, ShakataGaNai, and Tempodivalse. Contributors opposing included Blood Red Sandman, Pi zero, BrianMcNeil, and Bawolff. A record of some of this conversation can be found at Wikinews:Water_cooler/miscellaneous/archives/2009/October#Articles copied from VOA. The Serbian Wikinews bot was imported and beta tested on the project starting on October 2009, with the announcement made on Wikinews:Reports/October 2009. The request to test the bot is found here.
  5. Stats used for the dates and all the data for this analysis can be found on or linked from http://stats.wikimedia.org/wikinews/EN/TablesWikipediansEditsGt5.htm
  6. See Посебно:Доприноси/Millbot-Beta for a history of the bot’s editing.
  7. Most of this analysis assumes there are no other major changes to either project that would lead to “unnatural” changes in community output and participation. This is not true for English Wikinews, which underwent major changes in reviewing. This led to the creation of a fork in September 2011, with the date mentioned at Wikizine/EN2011-128. The project closed and deleted in August 2012 according to English Wikinews’s Signpost at Wikipedia:Wikipedia Signpost/2012-08-20/News and notes. There are also other independent variables present on English Wikinews that may possibly accountfor downward editing trends. Some of these mirror patterns on English Wikipedia, including moves towards making information more neutral, verifiable and greater enforcement of copyright policy.
  8. This number will always increase because it is not an average of who has edited in a given month but historically how many people have ever edited.
  9. As a point of contrast, following the Open Globe fork from English Wikinews, the slope for new editor growth on English Wikinews was higher than both of the periods mentioned. It was 0.232.
  10. This was done by adding the total number of editors to main space in these groups.
  11. Data found at http://stats.wikimedia.org/wikinews/EN/TablesPageViewsMonthly.htm , which provides stats from June 2008 to May 2013.
  12. The traffic averages in that periods suggest higher traffics, but this is offset by the medians which suggest the opposite. The following tables provide greater insight into traffic median and mode for these periods. English, Spanish and Polish Wikinews traffic information is provided as a basis for comparison.
    Sortable table
    Period Math / Dates Median - Serbian Mean - Serbian Median - English Mean - English Median - Spanish Mean - Spanish Median - Polish Mean - Polish
    Prior to content import June 2008 - June 2009 186,000.00 183,384.62 5,700,000.00 5,838,461.54 703,000.00 724,846.15 712,000.00 714,615.38
    Content import active October 2009 - March 2013 274,500.00 322,571.43 5,550,000.00 5,695,238.10 665,500.00 746,690.48 689,000.00 719,571.43
    Post Open Globe fork September 2012 - June 2013 296,500.00 292,200.00 6,400,000.00 6,260,000.00 710,000.00 743,200.00 671,500.00 696,000.00
    Content import halved October 2010 - March 2013 274,500.00 280,033.33 5,500,000.00 5,490,000.00 649,500.00 683,400.00 667,500.00 698,833.33
    Content turned off March 2013 - June 2013 271,000.00 266,666.67 6,500,000.00 6,666,666.67 908,000.00 913,000.00 664,000.00 681,000.00
    Content import most active October 2009 - October 2010 256,000.00 405,461.54 6,100,000.00 6,153,846.15 876,000.00 878,384.62 794,000.00 746,461.54
  13. As a point of reference, English Wikinews has a completely different pattern than Serbian Wikinews.
    English Wikinews articles created per day versus article views.
    English Wikinews correlations generally suggest to a small degree, the greater the content production, the more views, though the correlation for the period between March 2013 and May 2013 suggests the opposite is true: the less content produced, the greater the page views. Similar patterns also hold relatively true for Spanish Wikinews. The period between September 2012 and May 2013, which is the period after the closure of the English Wikinews fork, has a correlation of 0.853. For Spanish Wikinews, when compared directly to Serbian Wikinews, the pre-content import period of June 2008 to June 2009, has the most randomness for the relationship between daily content production and page views with a correlation of 0.23.
    The relationship between per day article production and views on Spanish Wikinews.
  14. Given the nature of SEO and the amount of traffic derived from Google, it is possible that Google’s algorithm gave less value to Serbian Wikinews articles that were copies from other sites. Serbian Wikinews also lacks a visible Twitter and Facebook account. For English and Spanish Wikinews, where Google may prefer the content because it is original and is more likely to put results higher in searches, Google related traffic may be more consistent overall. It is also possible that English and Spanish Wikinews traffic may also be dependent on other variables such as type of content, social media efforts, incoming links from sister projects, etc.
  15. This number is based on dividing the total number of articles and the total number of monthly page views. This number is likely not a true reflection of actual views because page views includes all pages on a project and, according to the stats page, contains bot generated traffic totals which account for roughly 15% of all page views counted.
  16. Some of this can possibly be explained by total language speakers. Serbian is spoken by approximately 9.2 million people compared to 40 million Polish speakers and 500 million Spanish speakers. This cannot completely account for all the differences. The June 2008 starting point is 80 views for Serbian Wikinews compared to 119 for Polish Wikinews and 314 for Spanish Wikinews. If traffic was based on relative population of speakers, Serbian should have started at a lower average or Polish should have started at a higher point: The two are too close, despite one language having about 5 times as many speakers.
  17. This research is not applicable to other Wikimedia projects, because news is news. Once published, new stories are generally not refactored. Instead, new news stories are published with updated information. This implicitly differs from Wikipedia, Wiktionary and Wikivoyage, where imported content could easily be refactored, changed and updated by the community. Other research would need to be done to determine the success of content import on community and traffic on other sister projects.

Analysis conducted by LauraHale. Raw data used for analysis available upon request.


External links[edit]