User:Arjunaraoc/Small impact of the large Google Translation Project on Telugu Wikipedia

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

As part of the initiative to extend Google Translate software for languages of India, Google launched the Google Translation Project(GTP) to translate popular English Wikipedia articles into various Indian languages using paid translators. The project was presented in Wikimania 2010 by Google. A critical review of the project by Tamil Wikipedians was also shared in the same meeting. Google tried to address the shortcomings by engaging with Tamil Wikipedian community, but could not make much progress. Google finally announced that the project was completed in July 2011. In this review, the metrics on contributions and page requests from Wikipedia are used to analyse the impact of the project on Telugu Wikipedia. The findings show that the project which resulted in increase in article count by 4.6% and data base size by 200% impacted page requests below 5%. More than 99% of translated articles remain unimproved. Volunteer developed featured article pages had 3.5 times the page requests of translated pages.

Approach[edit]

The questions that are addressed in this article are the following.

  • What is the scope and extent of Google Translation Project's contributions?
  • How much did the key paramters like article count, number of wikipedians, database size increase due to GTP?
  • What is the popularity of the GTP pages as observable from page requests on Wikipedia?
  • How does the popularity compare to volunteer developed featured articles?
  • What is the status of the GTP pages?
  • Why is it difficult to clean up GTP pages
  • How good is the Google Translate quality?
  • What are the lessons from this project?

The approach consists of acquiring the metrics about wikipedia contributions and page requests, analysing them for the period of four years in which Google Translation Project occurs during the middle two years. Descriptive statistics are used to depict the impact. Observations and experiences from volunteers involved with GTP are used to capture the qualitative findings.

The following sections briefly describe the background information about Google Translation Project, criticism from Tamil Wikipedia midway through the project and Telugu Wikipedia, before getting into the specific details of the analysis.

Google Translation Project in brief[edit]

Google Translation Project (GTP) started as a stealth project and many Wikipedians came to know about this when they saw new articles in the "Recent changes" time line of Wikipedia with the comment mentioning that these were translated using Google translator tookkit tool. Apart from a blog post [1] Google did not provide the presentation material used in Wikimania presentation, I share the details from my notes taken at that time. Google Translate tool is able to translate 2500 language pairs in fully automated way and about 100000 pairs with help of human translators. Initially, Google Translate used huge corpus of western language translation documents available from United Nations and achieved successful automated translation results. The Translator Tooklit tool was made public in June 2009. As such corpus was not available for Hindi and other Indian languages, Google experimented with Hindi by translating 100 articles from English in August 2008. Subsequently Google expanded the scope of the project and extended this to Arabic,Tamil, Telugu, Bengali, Kannada and Swahili. In Arabic, Volunteers as well as paid translators used the tool. About 1000 articles amounting to 5M words were translated. 20% of articles were rejected by the language community. In Swahili, Google made use of contest to promote the project. 800 people registered for the contest and about 100 people were active. In Indian languages, 1000 articles were added amounting to 10 Million words. Out of 6.5M words of Tamil Wikipedia, 3M words were contributed by GTP. About 1.55M words were contributed to Telugu Wikipedia. After addition of about 0.7M words, Bengali community rejected the project. The project concluded in June 2011. The contributions to various indian language wikis from the usage of translate toolkit tool predominantly by GTP project editors and in few cases by volunteers is summarised below based on the data from Wikipedias as of May 2015.

Wikipedia GTP articles
Hindi 2321[2]
Telugu 1989[3]
Tamil 1290[4]
Kannada 2204[5]
Bengali 48[6]


Google Translate as a technology for translation of phrases and simple sentences for most languages improved dramatically in recent years. It added several languages to its fold, including languages rejected by the wikipedian communities at the time of its initial experiment on Wikipedia, such as Bengali. Many people participated in improving the tool by joining Google Translate community[7]. It is being promoted as a personal interpreter across the world [8], with the availability of mobile apps, supporting handwriting, text to speech interfaces, apart from providing the content translation on web for free through Chrome browser.

Initial criticism from Tamil Wikipedia[edit]

In a critique of GTP shared during Wikimania, Ravishankar identified several issues with the project. [9]

The strategy of selecting English articles based on the popularity in English language search results from India resulted in translation of articles related to Hindi movies and American pop stars, which were not of interest to Tamil users.

Quality of the translated articles suffered from the following

  1. Many red links for articles, templates and images.
  2. The mechanical feel of translation as the tool encouraged word for word and sentence for sentence translation, even if 95% manual translation is done.
  3. As source articles from English Wikipedia were not selected for quality, errors and bias got carried through the translation.
  4. Excessive use of transliteration was also considered a drawback, as it is a violation of local wiki manual of style.

Operational problems faced by the community included the following

  1. Stub articles being overwritten by GTP contributors
  2. Use of one wiki account by more than one GTP contributor hampering the efforts to communicate with the GTP contributors.
  3. One to one interaction among Wikipedians became hierarchical as the feedback is to be routed through Google coordinator
  4. Coordination load on regular Wikipedia contributors increased a lot

Philosophical issues raised by the project included the following

  1. Downgrading of the quality of Wikipedia, because of the large scale contributions with poor quality by GTP
  2. Impact on volunteerism of Wikipedia contributors, as there is opportunity to get paid.

The recommendations of the Tamil community included changes to the process to improve the quality.

In response, Google attempted to improve the coordination with Tamil Wikipedia community. Without much success, Google announced the completion of the project in June 2011.

Telugu Wikipedia[edit]

Telugu Wikipedia was started on Dec 10,2003 by Venna Nagarjuna, famous for 'Padma', a system for transforming indic text between various encoding formats. As of June 1, 2015, the article count stood at 60,154 and is third among the Indian language Wikipedias, next to Hindi and Tamil. Bulk of the articles consist of villages and movies that were developed initially through bot scripts. It was the first indic wikipedia to use transliteration input method for typing in Telugu. It recorded highest page requests of 4.5M in Feb 2010. The page requests for the month of March 2015 are at 2.68M[10]. The bulk of new articles creation happend during 2007 and 2008. Google Translate Project(2009-2011) and WMF funded CIS-A2K initiative on Telugu Wikipedia (2013-till date) are the major organisational initiatives to grow the Wikipedia.

The community has about 64 active editors who edit 5 or more pages in a month. Very active editors with 100 or more edits per month are 17 (Mar 2015 statistics). The community initiated weekly featured articles and pictures from June 2007 and has been maintaining the updates regularly. The community celebrated English wikipedia tenth anniversary (Wiki10) and Telugu Wikipedia tenth anniversary during 2011 and 2014 respectively. It honored 10 contributors after an indepth analysis of their contributions as part of Telugu wikipedia 10th anniversary celebrations[11]. Couple of volunteers have received individual engagement grants from Wikimedia Foundation to enhance the availability of digital library resources for volunteers[12].

GTP -Telugu Wikipeda[edit]

Belgium English Wikipedia article extract
Belgium-Google Translation Project Article extract from Telugu Wikipedia

I participated in one coordination meeting with Google along with Tamil Wikipedians, following the Wikimania. Google shared a spreadsheet of the list of articles targeted for translation and their status. Google assured us that they will work with Telugu Wikipedia after gaining experience on Tamil Wikipedia. While we waited for that to happen, we were surprised to hear the announcement of the completion of the GTP in June 2011. So after about four years, I piece together the GTP details related to Telugu Wikipedia by analysing Wikipedia statistics to assess its impact.

Determining GTP contributions[edit]

Google Team shared a list of target articles amounting to 5941 entries for translation into indic languages, but this was found to be incomplete as the list obtained from the actual uploads had only 873 from this list. It could be that the list was changed midway. Google Translate updates are marked by a 'rev_comment' like this when one sees the history of a GTP article.

"(ప్రస్తు • గత) 2009-12-17T07:53:15‎ Charminarh (చర్చ • రచనలు • నిరోధించు)‎ . . (1,84,304 బైట్లు) (+1,79,969)‎ . . (Translated from http://en.Wikipedia.org/wiki/Belgium (revision: 327582701) using http://translate.google.com/toolkit.) (దిద్దుబాటు రద్దుచెయ్యి • కృతజ్ఞత తెలుపు)".

Sometimes the updates could be done anonymously and sometimes the "% translation" also could be missing. Some translations may have been done by volunteer wikimedians by experimenting with Google Translator toolkit. Sometimes, Google contributors have overwritten pages already existing without checking.

So to ascertain the number of articles contributed by GTP, the following procedure is used. Counts for Telugu Wikipedia are shown after each step.

  1. Obtain the list of articles with the revision comment containing 'toolkit': 1991 (as on 3 May 2015) [13]
  2. Look at the rev_timestamp and exclude articles outside the project duration(2009-07-01 to 2011-06-30) and also eliminate any articles known to be created by volunteer Wikipedians:1989
  3. Volunteer wikimedians have flagged the GTP articles, by including a specific template on the top of article page (about 978), which adds GTP articles category. Add the template to the remaining pages so that easy onwiki access is available to all articles. The category can be used for queries as well.
  4. Determine page_ids from the list and use the identified articles by their page_ids for further analysis[14].

Extent of contributions[edit]

Count of Google Translation Project articles vs month of their upload

Major parameters of GTP contributions are captured in the table below.

Parameter(unit) Value
Duration(months) 24
Number of articles 1989
Number of editors 65
Median article size (bytes) 59824
Total contribution size(M bytes) 154
Estimate of total contribution of words(M words)[15] 7.5
Estimate of cost of project(in USD)[16] 750000

The percentage of increase in article count and database size is 4.6 and 200 respectively.

Quality of contributions[edit]

Google Translation Project quality indicator human translation percentage vs time for Telugu Wikipedia

Articles translated using Google Translator toolkit report the percentage of human translations. For the initial period of nine months of the project, this information was not available. The available information is shown in the figure of type violin plot, where the width of the violin shows the density of the measurement, with most measurements falling between 90 and 100. For some small articles, the measurement could have been less. It shows that the percentage of human translations remained steady at quite high level for most articles. Google might have called off the project, when they got sufficient corpus rather than when the quality actually improved.

Impact on Telugu Wikipedia[edit]

Key editing statistics[edit]

The table below shows several Wikipedia statistics for the period of GTP and one year before and one year after GTP. It is clear that the GTP project increased data base size and average article size.

month Wikipedians_total art_count_official_k art_mean_bytes db_size_MB db_size_words_M links_int_k
2012/06 506 51.00 2117 270.00 13.80 525.00
2011/06 428 48.00 2062 250.00 12.80 494.00
2010/06 348 45.00 1180 137.00 6.70 394.00
2009/06 281 43.00 686 77.00 3.60 275.00
2008/06 224 40.00 521 55.00 2.50 218.00
2007/06 103 32.00 334 28.00 1.30 134.00
Telugu Wikipedia key parameters percentage change during 2008-2012

To understand the impact of the GTP, it will be useful to compare the percent increases for the same period for the above variables. It can be seen that all the parameters have below 50% growth at the start of the GTP. Though yearly percentage change in database size and average article size increased during the GTP period, all the parameters came down to below 30 percentage after one year of end of GTP. So GTP did not have major impact on editors and their contributions.

Page requests[edit]

Telugu Wikipedia page requests during 2008-2012

Let's view the page requests for the same period. Page requests reached an all time peak of 4.5M during Feb 2010, but declined rapidly afterwards. The smoothed page requests shows that the increasing trend existed before GTP started and the trend became negative shortly there after. Again the trend became positive in 2011, though at a lower level. If we consider other Wikipedias that were part of Google Translation Project, these experienced peak page requests during other periods. Hence it can be summarised that there is no correlation between GTP and the increase in page requests in general.

Popularity[edit]

Across Wikipedia
2011-07
  • Total page requests of entire wiki pages: 2.4 M
  • Total page requests of GTP project pages: 0.093407 M (as percentage 3.9)
  • Percentage page requests per GTP article: 0.0019
2014-03
  • Total non mobile page requests of entire wiki pages: 1.9 M
  • Total non mobile page requests of GTP project pages: 0.107424 M (as percentage 5.7 )
  • Percentage non mobile page requests per GTP article:0.0028
Among Top 1000 pages
  • 2014-03 non mobile
    • Top 1000 pages page requests: 437754
    • GTP pages in Top 1000 pages: 25
    • GTP page requests: 8581
    • Percentage of GTP page requests in Top 1000 for non mobile: 1.96
    • Percentage of GTP pages in Top 1000 pages: 2.5
  • Comparison with Featured article page requests in Top 1000
    • Number of featured articles till 2013: 334
    • Number of non GTP featured articles till 2013: 328
    • Number of page requests for non GTP featured articles: 66805 (as percentage of top 1000 3.51)
    • Percentage of non mobile page requests per non GTP featured article:0.01 (3.57 times GTP)
    • Percentage of non GTP featured articles in Top 1000 pages: 32.8

From the above data, it is observed that the popularity of GTP pages in terms of page requests is low with values ranging from 3.9 to 5.7 considering the entire wiki pages. In the pages with top 1000 page requests, volunteer contributed featured articles had 3.57 times the popularity of GTP pages, after three years nine months of the end of GTP.

Present status of contributed articles[edit]

The problems identified by Tamil Wikipedians about GTP are applicable to Telugu Wikipedia. Vyzasathya, a long time contributor to wikipedia shared the following observations about the project.

  • GTP contributors had no clue of the whole environment surrounding the piece of text they were translating. None of them seemed to have any idea of Wikipedia style or article layout or things like Templates and Template variables.
  • I assume they also never checked their wikipedian accounts because when I tried to communicate with them by leaving a message on their talk page, they never responded back. They don't have any reason to visit Wikipedia as the tool handles the posting of article to Wikipedia.
  • Sentence structure in Telugu is completely opposite of that of English, so any word to word translation without changing the sentence structure, even if it is intelligible in some cases, sounds unnatural."

We tried to assess the work involved in cleaning up the articles, by experimental cleaning up of few articles and developed a guideline and shared with the community. A small number of articles (9 articles out of 1989 as of May 28,2015)(i.e 0.45 %)) were cleaned up and some were featured on the front page as well, but the community never progressed the clean up work.

The major reasons for the failure of the community to clean up are

  • Many articles may not be of interest to the small active community
  • Clean up requires dedicated time of 6-8 hours for an article and good understanding of the quality aspects, ability to locate and read the English source article, ability to understand how Google Translate works and also the translation style and capability of the paid translator.
  • Technical limitations include the inability to provide side by side comparison of source and translated article, lack of large screens with the computers used in general by the volunteers.
  • Volunteers have more pride in originating and developing new articles in their area of interest rather than cleaning up some one else's paid work.
  • Nobody ever analysed the extent of contributions and categorised them subject wise and provided any easy to use tools for the clean up.

So the articles continue to languish in the same state of their initial upload, except for minor changes by spell check bots or interwiki bots etc.

Manual assessment of automated translation of Telugu-> English[edit]

There were very few studies of quality assessment of Google Translate on simple sentences[17] and small paragraphs [18]. Though we concluded that GTP project was not much useful for Wikipedia, we recently made use of Google Translate to convey to the larger Wiki community about the discussion on Annual Plan Grant proposal on Telugu Wikipedia, by asking them to use Google Translate with Chrome browser. We also provided a manual translation of the final comments, so that the wiki community can understand the gist of the discussion without any loss of quality because of automated translation. This provided an opportunity to assess the translation quality. The following table shows the assessment of the translation. Please note that the organisation name refered in the text has been changed to XYZ-ABC to keep the discussion focused on translation quality.

Source Telugu Google Translation Telugu ->English Human Translation Telugu->English Google translation message understanding by a reader (1:Same as intended,0:Confusion,-1:Opposite of intended)
వికీపీడియా లో గత ఐదు సంవత్సరాల పైగా కృషి చేస్తున్న వ్యక్తిగా, XYZ కృషికి తోడ్పడమే కాకుండా వారి కృషిని దగ్గరగా పరిశీలించడం, లోతుగా విశ్లేషించడం మరియు XYZ చేయబోయే కృషిని చర్చించిన వ్యక్తిగా నా ముగింపు వ్యాఖ్యలు క్రింద ఇస్తున్నాను. Wikipedia, the person doing the hard work over the past five years, close scrutiny of their work, rather than contributing to the work of the XYZ, the XYZ will be deeply analyzed and discussed the work the person is granted under my closing comments. As a contributor to Wikipedia for more than 5 years and as a person who supported, observed and analysed XYZ-ABC work and their proposal closely , I give my final comments below. -1
# రెండున్నర సంవత్సరాల క్రిందట గుర్తించిన అవసరాలు మరియు వాటి ప్రస్తుత స్థితి చూస్తే , XYZ-ABC ప్రధానంగా భౌతిక మరియు ఎలెక్ట్రానిక్ ప్రచార కార్యక్రమాలు,శిక్షణశిబిరాలు, సంస్థాగత భాగస్వామ్యాల పై కృషి చేసింది.వీటివలన అంతకుముందు కంటే తెవికీ ప్రచారమెక్కువవటం ఒక మంచి పరిణామం. అయితే వీటిని విజయవంతంగా చేశామని చెప్పుకుంటున్నా, సర్వేలు, లోతైన గణాంకాలు, లోటుపాట్ల విశ్లేషణల, సుస్థిరం చేయకలిగిన అంశాల ఆధారాలు లేవు.సహసభ్యుల రచ్చబండ మరియు ప్రణాళికా చర్చలలో లోటుపాట్లు బహిర్గతమయ్యాయి.ముఖ్యంగా ఆఫ్లైన్ చర్చలు ప్రముఖ మై ఆన్లైన్ చర్చలు వెనుకబడ్డాయి.భౌతిక చర్చలను సరిగా మరియు సత్వరంగా నివేదించకపోవటంతో సముదాయం బలహీనపడింది. విధానాల రూపకల్పన, ఓపెన్ సోర్స్ సంస్కృతి,సాంకేతికసహాయం అంశాలు పెద్దగా మెరుగవలేదు. In two and a half years ago identified needs and their current status, XYZ-ABC of the physical and electronic promotional activities, siksanasibiralu, working on corporate partnerships cesindivitivalana teviki pracaramekkuvavatam a better evolution than previously. However, they have been told to be successful, surveys, in-depth statistics, analysis of deficits, affirming Kriya levusahasabhyula evidence of the shortcomings in the proceedings are planning to pump and offline discussions bahirgatamayyayimukhyanga discussions leading my online discussions venukabaddayibhautika properly and promptly nivedincakapovatanto impaired community. The formulation of policies, the open-source culture, much better sanketikasahayam items. If one observes the needs identified two and half years back and their present status , XYZ-ABC has worked primarily on physical meetings, Wiki Academies and partnerships. Due to this, Telugu wikipedia awareness has improved, which is a positive development. Though these are claimed as successful, this is not based on surveys,statistical analysis,detailed reports of shortcomings and how they are addressed, and plans for sustaining the same. The shortcomings have been surfaced on Village pump and proposal discussion pages. Offline meeting s have become dominant over online discussions. As offline meetings are not well documented and timely communicated, community has weakened. There has not been much work on policy definition and deployment support, open source culture and technical help. 0
# సభ్యులలో అవగాహన కొంతవరకు పెరిగి, భౌతిక సమావేశాలు,TTT శిక్షణసమావేశాలు లో తెవికీ సభ్యులు పాల్గొనటం , ఇద్దరు సభ్యులు IEG గ్రాంటులు పొందటం ఒక మంచి పరిణామం. దీనికి XYZ-ABC చేసిన కృషి మరియు అందించిన తోడ్పాటు అభినందించదగినిది. Among some increased awareness, physical meetings, TTT siksanasamavesalu participation in the teviki members, two members of the IEG to gain access to a good evolution of grants. The efforts and contribution made by the XYZ-ABC abhinandincadaginidi. Due to better awareness among wikipedians, Telugu wikipedians have attended physical meetings, TTT programs. Two wikipedians have receieved IEG grants. I congratulate XYZ-ABC for their work and support extended. 0
#సంస్థాగత భాగస్వామ్యాలు సమర్ధవంతంగా నిర్వహించకపోవటంతో ఇప్పటికీ సుస్థిరతపొందిన ఒక భాగస్వామ్యంకూడా లేదు.అంతే కాక సంస్థల ఎంపిక, చేయవలసిన పనిపై అవగాహన పత్రం లోని లోపాలు మరియు వాటి అమలు కొంతవరకు వికీపీడియాలో మరియు వికీసోర్స్ లో నాణ్యత ని దెబ్బతీశాయి.ఇంకా జరగపోయే పనిలో వనరులను వికీపీడియేతర లక్ష్యాలకు వాడే అవకాశాలున్నాయి. Institutional shares of the companies selected to efficiently nirvahincakapovatanto still susthiratapondina leduante a bhagasvamyankuda, to document the work of the understanding of the extent of the defects and run the Wikipedia and Wiktionary jaragapoye debbatisayiinka in the quality of the work is expected to use the resources vikipidiyetara goals. As institutional partnerships were not handled well, there is not even one sustainable partnership. Selection of institutions, gaps in work identied in MOU and their implementation has impacted the Wikipedia and Wikisource quality. There is a possibility of using resources for work not connected with Wikipedia in future. -1
# సాంవత్సరీక సమావేశాల నిర్వహణ , మూడుసార్లు జరిగినా బలం పుంజుకోలేదు. Sanvatsarika meetings, held three times to gain support. Annual Wikipedia meets management has not gained strength despite attempting for third time. -1
#మొత్తం ప్రాజెక్టు వ్యాసాలు, ఎడిటర్ల సంఖ్య లాంటి గణాంకాలనే లక్ష్యంగా తీసుకోవటం వలన జరిగిన కృషి తెవికీ నాణ్యతను దెబ్బతీసింది. ఇది పరిమాణంలో చిన్నదైన సముదాయం పై నాణ్యతా లోపాలను సవరించే భారాన్ని ఎక్కువ చేసింది. Throughout the project, articles, statistics such as the number of editors taking aim hampered by the quality of the work teviki. It is smaller in size than the burden of the community has on the quality of editing errors. As the focus is on total articles and total editors of wikipedia, the work done has impacted Telugu wikipedia adversely from a quality perspective.This has increased the workload on the small community to fix the quality issues. 0
#XYZ-ABC వికీపీడియాలో జరిగిన కృషి ని లోతుగా విశ్లేషించక, మొత్తం కృషిని తన ప్రణాళికలో చూపటంతో, స్వచ్ఛంద సభ్యుల కృషికి సరియైన గుర్తింపు దొరకుటలేదు.అందువలన సముదాయంలో అటువంటి కొందరు సభ్యులు కృషిని తగ్గించుకున్నారు. దీనివలన సముదాయం బలహీనపడింది. Thanks to the efforts of the XYZ-ABC vislesincaka deeper, the work by pointing to his plan, the contribution of the voluntary efforts of some members, such as reduced fleet dorakutaleduanduvalana proper identification.This group is weakened. Volunteer wikipedians are not being appropriately credited for their work, as XYZ-ABC is using the overall project level metrics and is not providing detailed analysis of the project. Therefore, some wikipedian s are reducing their contributions thereby weakening the community. 0
#ప్రాజెక్టుల ను రూపొందించడం మరియు అమలు చేయడం సమర్ధవంతంగా చేయలేదు. ప్రాజెక్టులు నిర్వహించడానికిఅంతకు ముందు స్వచ్ఛంద సభ్యులు నిర్వహించిన ప్రాజెక్టు మంచి సంప్రదాయాలు కూడా తమ ప్రాజెక్టులలో వాడలేదు. సముదాయంతో సంప్రదింపులు, గణాంకాలతో కూడిన నివేదికలు, లోటుపాట్లు విశ్లేషణకు ,సముదాయంతో సహకారమునకు తగిన ప్రాధాన్యత ఇవ్వలేదు. The formulation and implementation of projects has not been effective. Nirvahincadanikiantaku projects carried out by the volunteer members of the project are also good practices used in their projects. Contact fleet, statistics and reports, analysis of shortcomings, with a fleet of cooperation given the appropriate priority. Project management is inefficient. Even the best practices followed by volunteer Wikipedians have not been used by XYZ-ABC for its projects. Discussions, reports with detailed analysis of statistics and shortcomings and cooperation with community were not given the necessary importance. 1
#ప్రణాళికలో ఇతరులనుండి పొందే సహాయాన్ని వేరుగా చూపించినా, నివేదికలలో ప్రాజెక్టు స్థాయిలో నిజంగా అందిన అటువంటి సమాచారం ఇవ్వకపోవడం ప్రణాళిక బలంగా లేకపోవటాన్ని సూచిస్తుంది. బడ్జెట్ తగ్గితే ప్రాధాన్యతలని ఏకపక్షంగా నిర్ణయించడం జరిగింది. Apart from many other artists who portrayed the aid plan, the reports received from the project level is really strong, the lack of such information does not indicate the plan. Unilaterally determine the budget priorities were relieved. XYZ-ABC provides break up of funds from WMF/FDC and others in their detailed plans, but its unwillingness to share the same level of detail for actuals may indicate the poor quality of proposal.If there is a cut in budget, the priorities are decided with out consultation. 0
#కొత్త ప్రణాళికలో చాలా వరకు వ్యక్తిగతంగా సభ్యులు చేయగలిగిన ప్రాజెక్టులనే చూపించారు. సాహసోపేతమైన, సృజనాత్మకమైన పనులు లేవు. The new plan projects that could have a lot of members showed up to the individual. Bold, creative works there. The new proposal contains projects which can be done individually by Wikipedians, without much need for institutional help. There are no risky and creative initiatives. -1
#మొత్తంగా, XYZ-ABC కృషి వలన తెవికీ ప్రచారం మరియు సభ్యులలో అవగాహన కొంతవరకు మెరుగైనా వికీపీడియా , వికీసోర్స్ నాణ్యతపై మరియు మొత్తము సముదాయంపై ప్రతికూల ప్రభావాన్ని చూపింది. లోటుపాట్లు గుర్తించలేకపోవటం,కొత్త ప్రణాళికకూడా క్రిందటి ప్రణాళికల మూసలోనే వున్నందున ఇది ఇకపై తెలుగు వికీపీడియాకి పెద్దగా ఉపయోగపడదు. కావున ఈ ప్రతిపాదనని వ్యతిరేకిస్తున్నాను. Overall, due to the efforts of the XYZ-ABC teviki somewhat better awareness campaign and a member of Wikipedia, Wikimedia Commons has had a negative impact on the quality and the entire community. Locate shortcomings, since it is no longer Telugu vikipidiyaki new pranalikakuda In the plans, the mold can be larger.So against this proposal . Overall, though there has been increase in Telugu wiki awarenes, adverse impact on the quality of Wikipedia and Wikisource and also on community has been felt. As the new proposal does not address the shortcomings and is similar to the past proposals, this will no longer be beneficial to Wikipedia, I oppose the proposal. 1


Quality of Google Translate (Telugu ->English) as on 2015-05-29 # of messages Percentage
Google Translate conveys same as intended message 2 18.18%
Google Translate conveys confusion 5 45.45%
Google Translate conveys opposite of intended message 4 36.36%

From the tables of quality assessment, it can be seen that Google Translate has lot to improve for translating paragraphs of 2 to 3 sentences, as 64% of the time, it is failing to convey message or is conveying opposite message.

Limitations of the study[edit]

Despite close scrutiny of the data and filtering to make it specific to GTP, it is possible that few articles might have been attributed to GTP, though they might have been done by volunteer Wikipedians using Google Translator toolkit. Alternately, some articles contributed by GTP may have been omitted, if the articles failed to include the rev_comment specific to the tool while uploading or if uploads have been done anonymously. Future projects should have a better way to tag the contributions with a specific project name for better analysis.

Lessons learned[edit]

  • The impact per dollar of spend on page requests seems to be low, as can be seen from the percentage of page requests for the articles of this project, which ranged between 3.9 to 5.7. This is expected because of the strategy of selection of articles for translation is not specific for target Wikipedia.
  • The impact on the quality of Wikipedia was never assessed through surveys either by Google or Wikipedia community. The quality assessment before the project and during the project should be mandatory for future projects.
  • The project commenced in stealth mode. This led to some volunteers uploading articles without human translation. The initial assessment based on such articles gave the project bad name, though the actual contribution was better than pure automatic translations. Though Wikipedia is open content and contribution project, corporates who launch content contribution projects should engage with the communities proactively, so that the communities concerns are addressed.
  • Though Wikipedia did not get much benefit, the GTP itself was helpful for the public at large to get a sense of internet content in Telugu or other supported languages in the desired languages.
  • Before launching any major content contribution initiatives, it is necessary to have a thorough assessment of the impact on the eco-system. Now that WMF also provides Content Translation tool [19] and many more technology initiatives for OCR are likely to be rolled out from Google and other tech majors, this step would help in making the project useful for both Wikipedia community and the sponsor.
  • The intellectual property generated by the project consists of translation memories and now it solely belongs to Google. Though Google provides free services of translation for everyone, it would be useful to have free access to the intellectual property for Wikimedia use. (Such an approach was devised, when Microsoft launched its own Translate tool[20]).

Conclusion[edit]

The article presented the impact of Google Translation Project on Telugu Wikipedia based on the analysis of the project information and page requests from the wikipedia statistics and tools. The findings show that the impact of the project which increased the article count by 4.6% and database size by 200%, on page requests is below 5%.Volunteer devloped featured articles had 3.5 times the page requests of translated pages on an average from the sample of top 1000 pages. More than 99% of translated articles remain unimproved. Volunteer developed featured article pages had 3.5 times the page requests of translated pages. Lessons from this project include major responsibility of coroporates and Wikimedia foundation to do a through assessment of the impact of any proposed project. Though Google Translate as a tool has improved significantly to handle phrases and simple sentences for a variety of languages, the tool is not effective for translation of small paragraphs in languages such as Telugu.

Acknowledgements[edit]

The author acknowledges the Wikipedia tools makers Domas Mituzas,Henrik, Erik Zachte and Yuvi Panda for the excellent statistics and query support tools. He also acknowledges Ravishankar for the critical review of the Google Translation Project. He thanks Vyzasatya, Telugu wikipedian for his help in reviewing this article. He also thanks the R project team for excellent open source R language and also Coursera R programming course faculty and community for helping the author learn R and use it for this analysis.

References[edit]

  1. [author missing] (2015 [last update]). "Google Translate Blog: Translating Wikipedia". googletranslate.blogspot.in. Retrieved 28 May 2015.  Check date values in: |date= (help)
  2. [author missing] (2015 [last update]). "Pages translated using Google Translate-hiwiki - Quarry". quarry.wmflabs.org. Retrieved 2 June 2015.  Check date values in: |date= (help)
  3. [author missing] (2015 [last update]). "Pages translated using Google Translate-tewiki - Quarry". quarry.wmflabs.org. Retrieved 2 June 2015.  Check date values in: |date= (help)
  4. [author missing] (2015 [last update]). "Pages translated using Google Translate-tawiki - Quarry". quarry.wmflabs.org. Retrieved 2 June 2015.  Check date values in: |date= (help)
  5. [author missing] (2015 [last update]). "Pages translated using Google Translate-knwiki - Quarry". quarry.wmflabs.org. Retrieved 2 June 2015.  Check date values in: |date= (help)
  6. [author missing] (2015 [last update]). "Pages translated using Google Translate-bnwiki - Quarry". quarry.wmflabs.org. Retrieved 2 June 2015.  Check date values in: |date= (help)
  7. [author missing] (2015 [last update]). "Google Translate Blog: Translate Community: Google I/O Challenge". googletranslate.blogspot.in. Retrieved 29 May 2015.  Check date values in: |date= (help)
  8. [author missing] (2015 [last update]). "▶ Google Translate: The Restaurant - YouTube". youtube.com. Retrieved 29 May 2015.  Check date values in: |date= (help)
  9. [author missing] (2010 [last update]). "A Review on Google Translation project in Tamil Wikipedia - A-Review-on-Google-Translation-project-in-Tamil.pdf" (PDF). pdf.js. Retrieved 28 May 2015.  Check date values in: |date= (help)
  10. Latest summary statistics of Telugu Wikipedia
  11. [author missing] (2015 [last update]). "A comprehensive evaluation of Wikimedia contributors for recognition « Wikimedia blog". blog.wikimedia.org. Retrieved 1 June 2015.  Check date values in: |date= (help)
  12. [author missing] (2015 [last update]). "Grants:IEG/Making telugu content accessible - Meta". meta.wikimedia.org. Retrieved 1 June 2015.  Check date values in: |date= (help)
  13. List of articles with the revision comment containing 'toolkit' on Telugu Wikipedia
  14. Query for GTP articles using the page_ids
  15. Assuming 1M words are contributed by volunteer contributors
  16. Assuming 0.1 US$ per word of translation as per translation industry tariffs
  17. [author missing] (2014 [last update]). "An Analysis of Google Translate Accuracy". translationjournal.net. Retrieved 29 May 2015.  Check date values in: |date= (help)
  18. [author missing] (2014 [last update]). "A Survey of Translation Quality of English to Hindi Online Translation Systems (Google and Bing) - ijsrp-p1355.pdf" (PDF). pdf.js. Retrieved 29 May 2015.  Check date values in: |date= (help) (Warning:Quality of English used in paper is below expectations for a journal)
  19. [author missing] (2015 [last update]). "The new Content Translation tool is now used on 22 Wikipedias « Wikimedia blog". blog.wikimedia.org. Retrieved 1 June 2015.  Check date values in: |date= (help)
  20. [author missing] (2015 [last update]). "Enhancing Multilingual Content in Wikipedia - Microsoft Research". research.microsoft.com. Retrieved 29 May 2015.  Check date values in: |date= (help)