Jump to content

Grants talk:Project/Wikitongues Poly Feature Set 1

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 7 years ago by Nemo bis in topic Copyright

Differences from Wiktionary


How does this project differ from Wiktionary and how will it interact with it? Will it interact with Wikidata? ChristianKl (talk) 22:11, 4 October 2016 (UTC)Reply

Thank you for asking! The primary distinction between Poly and Wikitionary is that Poly is a subjective, rather than objective, compendium of language. Simply put, Poly won't contain one single comprehensive dictionary per language. Instead, it will contain multiple dictionaries and phrasebooks per language, each authored by individual users or groups. Allowing this openness is crucial for two reasons: 1) since language itself is subjective, we can expect to gather more dialectical, idiolectical and sociolectal data by avoiding assessments of what constitutes a "correct" translation, and 2) minority language communities — especially indigenous communities in the North America and Australia — sometimes shy away from projects that don't afford them full control over the linguistic documentation process. Ultimately, Poly will interact with Wiktionary by contributing all aggregated language content that users agree to be released. In the immediate future, we don't anticipate a direct relationship between Poly and Wikidata, although we do see potential to improve the structure and metadata for language objects. To that end, we're very open to any feedback or ideas. dbudelll (talk) 11:38, 11 October 2016 (UTC-4)
ViewPoint is nothing like a wiki. Nothing of what you describe is incompatible with Wiktionary. How did you reach the conclusion that Wiktionary can't host this effort? Have you had a discussion with them? Did you try to incorporate such a dictionary in Wiktionary, in any language?
  • You can upload multiple "personal" recordings on Commons and let Wiktionary editors switch the entries to a different one (or none) if there is no agreement on their correctness.
  • Translations are rarely a problem, you can just list all alternatives in any entry. Phrasebooks are especially easy: they can just be an appendix and everybody can create their own if need be.
Besides, I have a problem with how you twist words to the opposite of their accepted meaning in Wikimedia: just like "open source" is the opposite of "proprietary", it is not "openness" to give someone ownership of a dictionary. In Wikimedia (and in any wiki really), openness is being ready to collaborative work.

Nemo 07:14, 14 October 2016 (UTC)Reply

Buon giorno Federico, and thank you for sharing your concerns. While many Wikitongues volunteers are Wikimedians themselves, I'm relatively new to this community, and so I thank for your patience as I clarify our practical and conceptual relationship to Wikimedia as it pertains to the nature of our proposal.
We are an independent non-profit with a unique volunteer community, many of whom are also Wikimedians, who through their efforts at Wikitongues, contribute back to Wikimedia Commons and Wikipedia.
As the Project Grants guidelines state:

"Project and Event Grants (PEG) support organizations, groups, and individuals to undertake high-impact, mission-aligned projects that benefit the Wikimedia movement. The PEG program process is public and led by the voluntary Grant Advisory Committee. We accept grant submissions at any time and in any language. [...] The Project Grants program funds Wikimedia community members - individuals, groups, or organizations contributing to Wikimedia projects such as Wikipedia or Commons - to organize projects that benefit the Wikimedia movement.

Grants are generally awarded in four broad categories, but we welcome all ideas!

1. Software

2. Research

3. Online organizing

4. Offline outreach"

With that in mind, we're seeking support for the development of open software for documenting and sharing language that will empower our mutual volunteers to collect more and better data, much of which would contribute to the growth of the Wikimedia corpus. You're correct in pointing out that Poly is not a Wiki in its own right, nor is our website; as an organization and volunteer community, we don't rely on a single platform. Rather, our name comes from an alignment with the Wikimedia Foundation’s philosophy: in particular, a belief in 1) the power of volunteer communities to crowdsource human knowledge, and 2) the value of open source technology."
Poly and Wiktionary have much in common, as well as some key distinctions:
Poly won’t have language editors.
Since language is a subjective experience, relying on editors as the gatekeepers of “correct” content presents certain ontological problems. In fact, there is no academic or scientific consensus as to how many languages exist. For instance, Ethnologue and Glottolog, two primary sources frequently cited by Wikipedia, list approximately 7,500 and 8,000 languages, respectively. The renowned linguist Wade Davis, on the other hand, has more commonly referred to an estimate of 6,000 languages worldwide. Moreover, linguistic borders are often porous and ill-defined, complicated by the existence of transitional dialects, as well as non-standard sociolects and idiolects.
With that in mind, avoiding the concept of "correct" language content opens the door of possibility to more data. Even if there were an editor for each of the world’s 6,000 - 8,000 languages, they might not be equipped to effectively assess different dialects or other forms of non-standard speech. For instance, a Scots-language editor of the Doric dialect couldn’t necessarily judge the authenticity of a Shetlandic word. Conversely, a Doric-speaking editor may disagree with a Shetlandic-speaking editor over the accuracy or meaning of a non-standard word from a transitional Scots sociolect. Leaving such a translation out of the dictionary corpus because of this disagreement would be an unfortunate omission of data.
Poly will allow multiple dictionaries and phrasebooks per language or dialect.
While Wiktionary does host alternate definitions for individual word entries, as well as appendices for extra data, we're pursuing a data model and user experience more geared towards emphasizing phrasebooks, which would facilitate a culturally-oriented interaction with that data. For instance, attaching the culinary concepts "pa amb oli" and "pa amb tomàquet" as an appendix to the Catalan word "pa" ("bread") isn't the same experience as consuming a Catalan - Italian dictionary exclusively about colloquial Mediterranean cuisine. Another distinct aspect of Poly, described in our [| feature set section], is the ability for individual authors to embed video descriptions in the dictionaries they create. For example, the author of the aforementioned Catalan - Italian dictionary could post a video explaining grammatical concepts, interwoven with their own cultural narrative as a Catalan speaker from Mallorca, who’s use of the language is known to be distinct. This kind of content, while certainly appropriate for Wikimedia Commons or Wikipedia, wouldn't have a place in the appendix of a Wiktionary datum about the Catalan word for bread, "pa".
Poly will allow users to release content under different licensing tiers.
On licensing: Poly is an open source platform from a technology perspective. When it comes to content, we will allow users to choose between releasing their content either under a CC-by-SA license, CC-NC license, or to protect their copyright altogether. As you pointed out, this is different from Wikimedia's approach, and we adhere to it for a variety of reasons, chief among them that it is a necessary condition for working with every language community in the world. There is a long history of marginalization of indigenous peoples, especially in the Americas, as well as in Australia and the Torres Strait Islands. Sadly, this has extended to the realm of linguistics, creating countless scenarios in which academics have worked with indigenous communities to document their languages, only to lock away that documentation in university archives. As a consequence of this exploitation, indigenous groups are often suspicious of outside communities and organizations striving to collect linguistic data; therefore, mandating that our users participate in a system of absolute open licensing would prevent the participation of dozens, if not hundreds of language communities. It must be stressed that this multi-tiered system is designed to protect members of marginalized cultures and encourage them to participate in the pursuit of contributing to the sum of linguistic knowledge — a fundamental point of alignment between the mission statements of Wikitongues and Wikimedia.
Poly is designed for field use.
Functionally, Poly will be optimized for mobile and offline use, which is a necessary feature for our mutual volunteers, who frequently travel for the purposes of linguistic documentation. Moreover, as we describe in our proposal’s sustainability section, our long-term plan includes cross-platform compatibility up to full SMS integration, which will empower low-bandwidth and offline language communities to participate in documentation. -dbudelll (email) 17:48, 17 October 2016 (UTC-4)

October 11 Proposal Deadline: Reminder to change status to 'proposed'


The deadline for Project Grant submissions this round is October 11th, 2016. To submit your proposal, you must (1) complete the proposal entirely, filling in all empty fields, and (2) change the status from "draft" to "proposed." As soon as you’re ready, you should begin to invite any communities affected by your project to provide feedback on your proposal talkpage. If you have any questions about finishing up or would like to brainstorm with us about your proposal, there are still two proposal help sessions before the deadlne in Google Hangouts:

Warm regards,
Alex Wang (WMF) (talk) 03:16, 6 October 2016 (UTC)Reply
Mohamed Udhuman (WMF) (talk) 09:38, 10 October 2016 (UTC)Reply


The website says "copyright 2015". No comment on it being under an open license? The youtube videos some are under the "Standard YouTube License" A few are under an open license Doc James (talk · contribs · email) 03:36, 14 October 2016 (UTC)Reply

Here it says NC [1] so not comptabible with the Wikimedia movements efforts. Doc James (talk · contribs · email) 03:39, 14 October 2016 (UTC)Reply
Thanks for asking! To clarify, our oral histories project featured on YouTube is separate from Poly and not the subject of this grant. The same goes for our website, which contains only expository information about Wikitongues, as well as video submission forms and a volunteer application. The footer copyright notice refers only to the website. YouTube videos adhere to a multitier licensing system. Interviewees can choose to allow us to release them under a CC-by-SA license, enabling use for Wikipedia. (This is a new effort on our part, but we have already begun contributing oral histories to the Wikimedia Commons as a result, and we would continue to do so with data aggregated by Poly.) Interviewees who want more control over how their videos get used can choose a Creative Commons NC license instead. As mentioned above, this is especially important to indigenous communities concerned with how their content gets released. Since YouTube doesn't offer CC-NC, we publish those videos under a Standard YouTube license and release them for CC-NC use upon request. We are working to streamline this process, as well as redesign our website, which was built both before this licensing system was put in place and before the Poly project was launched. You can read more about our licensing policy on Wikipedia. -dbudelll (email) 17:51, 17 October 2016 (UTC-4)

The description contains a contradiction: «Poly, a proprietary open source platform». The software is indeed not open source because there is no license at all in the repository. Wikimedia cannot fund proprietary software. --Nemo 07:03, 14 October 2016 (UTC)Reply

Thank you for raising this concern. Poly is to Wikitongues as Wikipedia is to Wikimedia. To avoid further confusion, we will ask the grant team for permission to remove the word “proprietary” from the summary. Poly is and always will be open source software, open to contributions by volunteers from around the world. Given that the platform itself is not yet open for use, we are still determining the best open source license to apply to the codebase. Any suggestions would be warmly welcomed. -dbudelll (email) 17:51, 17 October 2016 (UTC-4)
A classic GPL would be ok. Nemo 09:41, 6 November 2016 (UTC)Reply

Contribute to Wiktionary


This project doesn't seem to fit in the Wikimedia mission in any way. The grant proposal should be rewritten to highlight what benefits the project would bring to Wiktionary (or to Wikidata's dictionary if you're an optimist) and to qualify a way to assess said benefits. I don't see any section detailing the deliverables of this project, other than a brief reference to «video and search infrastructures are required, through which users would record words and phrases (both spoken and signed) in video and browse the vast body of content».

Wiktionary welcomes recordings of words and phrases in any languages (hosted on Wikimedia Commons); if this is what you intend to do, you should adjust your project to be about producing said recordings, uploading them to Wikimedia Commons with some meaningful metadata, and inserting them in existing or newly created Wiktionary entries. This would also eliminate the need for Amazon S3 hosting. mw:Extension:PronunciationRecording is our previous effort in this area and could provide a starting point.

As for search, please detail what kind of search capabilities you plan to build, based on which technologies. Internet Archive has a very advanced video search infrastructure, maybe you want something like that? If their technology is FLOSS you can improve it and work on integrating it in MediaWiki; if it's proprietary, maybe you should host your materials there. Nemo 07:03, 14 October 2016 (UTC)Reply

I disagree in quality with the initial premise: this project's stated goals do fit within the Wikimedia mission at the confluence of Wikisource and Wiktionary. However, the project does not appear interested in working with/within Wikimedia; I interpret the proposal as "we want you to underwrite our proprietary software development for a year." The measurables seem irrelevant to Wikimedia goals, the deliverables non-existent, and there does not seem to have been any research into either the resources or assets of existing Foundation projects. It seems likely to me such a project would have difficulty learning to cooperate with other teams on shared goals; they have a similar NMH approach to WMF's own development team. - Amgine/meta wikt wnews blog wmf-blog goog news 14:17, 14 October 2016 (UTC)Reply
First and foremost, thank you both for all your thoughtful feedback so far! As Amgine observes, we consider our mission well aligned with Wikimedia. Over the past six months, we have been working to facilitate direct collaboration between Wikitongues and the various projects, chapters, and communities of the Wikimedia movement. In fact, the current licensing program for our oral histories project, which will be applied to Poly dictionaries and is [described above], was devised by Wikimedians who are eager to build on our mutual alignment.
Taking into consideration: 1) our clear mission alignment; 2) our recent move towards contributing directly to the Wikimedia Commons and Wikipedia; 3) the growing number of Wikimedians who contribute to Wikitongues; and 4) the eligibility of independent open source software to receive project grants funding, we consider the feature set described in our proposal to be the primary deliverable for this project. Based on the platform’s core features, we are confident that this is a necessary tool for language documentation that can be leveraged by private individuals, our volunteer community, and Wikimedians alike. Most importantly, it will further our intent to contribute expansively to Wikimedia.
In particular, the data aggregated by Poly will be added to Wiktionary by 1) embedding video recordings of individual words and phrases via the Wikimedia Commons; 2) expanding alternate definitions through our focus on dialects, sociolects, and other non-standard forms of speech; and 3) expanding translations through the results of our focus on accumulating direct translations between traditionally unassociated languages.
We also intend to expand the quantity and quality of Wikivoyages phrasebooks through the aforementioned means. Beyond that, we’re very excited about Wikidata’s plans to develop a data model in order to better support Wiktionary, and have initiated a conversation with individuals from the Wikidata community to learn more about how we can contribute through our efforts. With the permission of the WMF grant committee, we will update our proposal to further elaborate these ideas.
Poly is open source software. It will always be freely available and built on an open codebase. We are still determining the best open source license to apply to that codebase, and we would greatly appreciate any suggestions.
The content Poly is built to aggregate will always be free to access, although individuals will have the ability to license their own dictionaries for re-use according to three tiers: 1) CC by SA; 2) CC-NC; and 3) to protect their copyright. As described in my response to Federico above, this is indeed a different approach to Wikimedia’s. It is in place to create a safer space for individuals concerned with the cultural appropriation of their content, in order to encourage members of linguistic minority communities, especially indigenous North Americans and Aboriginal and Torres Strait Islanders, to participate. Members of our volunteer community, as opposed to public users not necessarily affiliated with Wikitongues, will be encouraged to release under at least a CC-NC license; our mutual volunteers will release content under CC by SA. -dbudelll (email) 17:56, 17 October 2016 (UTC-4)
It is likely best if I express myself bluntly, to avoid possible misinterpretation.
  1. You appear to me to be duplicating efforts of other projects, the existence of which you may be unaware.
  2. A core characteristic of the project - individual dictionaries - is likely an absolute violation of the wikimedia model.
WMF continues to develop tools for video, audio, image, and other multimedia creation and collection. Your specific software endeavour likely reinvents multiple elements of existing WMF applications. Furthermore, it almost certainly recreates storage and retrieval systems which would, in theory, compete with Commons.
The recording tools which WMF develops and maintains can already be used to support the projects such as Wiktionary, Wikipedia, and Wikivoyage which already engage in every part of the descriptions you have given of your project except the personal dictionaries. Put in another way, the arguments you present for your project are sufficient but not necessary. That is, they are good arguments for your project. They are also good arguments for the existing WMF projects.
Finally, an address of the personal dictionary, which I view as an issue. It is important that language preservation be undertaken, but it is also necessary that this be done with discernment. For example, George MacDonald said he used as many as 21 distinct dialects of Scots English/Gaelic in en:Alec Forbes of Howglen[1], yet no one is currently able to identify so many as the differences between one village and the next is not considered great enough by today's standards. The franglais of Montréal is identifiably different between neighbourhoods, but not nearly as great of difference as temporally - the bilingual slang of 10 years ago, 20 years ago, varies much more from current use. More subjectively, it is certain that usage of my family is identifiably unique from any other family's, as is every other family's dialect unique. It is not academically useful to record such small gradations, but instead to work with researchers who are familiar with the state of the science regarding the specific languages and dialects to create a gestalt from multiple individuals. Creating an unfiltered mass of vanity field recordings/dictionaries is likely unhelpful: to use an idiom you will be creating haystacks to hide needles, but without incentive to search them. It will always be easier for a researcher to go into the field themselves.
I do not mean the suggested project is without merit. I do mean that interpreted simply it suggests a very large amount of bad or useless recording will occur. As an example: Irena speaking Northern Sami is a video of a Russian native speaker speaking Northern Sami, rather than a native speaker, and there is no metadata as to when/where she learned the language or if it is used in her home, place of employment, or region of residence.
These are merely my opinions. - Amgine/meta wikt wnews blog wmf-blog goog news 04:10, 18 October 2016 (UTC)Reply
I share Nemo's and Amgine's concerns. It's clear that this is a project that has already determined to write (and has been writing) standalone software with a given approach, and is seeking funding for that project and that approach. A comprehensive re-design or rethinking of the product and the approach are not on the table, it seems. But the project, as has been noted, at least partially overlaps existing efforts, including existing Wikimedia efforts, and is not likely to integrate with them well. Further, it would produce much data that would be unusable on Wikimedia due to the NC limitation, and, as Amgine noted, "haystacks hiding needles". Finally, with a major revamping of Wiktionary on the horizon thanks to the ongoing work by the Wikidata team, it seems to me that the Wikimedia movement should not allocate significant (~$100K) donor funds to an effort far less aligned with wiki norms and with the mainstream work we're already doing in the field of language documentation.
I understand it is unhelpful advice so long as you have hope of getting this grant funded, but if and when it does not get funded, I encourage you to spend some time learning about Wiktionary (including its many limitations, but also its approaches to some of the thorny questions of language documentation) and Wikidata, including Wikidata's planned support for Wiktionary, and to see whether you'd want to adjust your product's roadmap to be more wiki-based and free-license-compliant. The answer may very well be "no", but in that case, you'd be better off seeking funding from other sources. Ijon (talk) 13:48, 18 October 2016 (UTC)Reply

Impact on WMF projects?


What would be the impact on WMF projects? I don't quite see how this relates to WMF foundation. Makes we wonder how this landed here? --Jura1 (talk) 08:52, 28 October 2016 (UTC)Reply

Not eligible for Project Grant, Round 2 2016


Thank you for submitting this proposal for a Project Grant. Unfortunately, this project is not eligible for funding through our program because our General Guidelines only allow for "projects aimed at improving one or more of Wikimedia's existing websites," and exclude "projects aimed at improving third party applications." As noted by others on this talkpage, you may consider revising your proposal to focus on one of our existing projects, such as Wiktionary or Wikidata, and resubmit in a future round. Alternately, you can find information here about proposing a new project.

Questions? Contact us.

--Marti (WMF) (talk) 18:50, 1 November 2016 (UTC)Reply