Grants talk:Project/Dataviz 4 Wiki

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Extension:Graphs[edit]

In theory I'm certainly in favour of improving graphing in mediawiki.

However, in this proposal I don't see any mention of the fact we already have an extension that does this (Extension:Graph) which uses a pretty mature library as these things go (Vega - https://vega.github.io/vega/), nor a comparison of how this differs compared to Extension:Graph, and the advantages of implementing an entirely new system over improving an existing system.

I would hate to see this get built but then never get deployed because we already have something that does this.

In this proposal I would expect to see a section on how this differs from Vega and Extension:Graphs and why it would make sense to have both.Mvolz (talk) 09:29, 18 February 2020 (UTC)

Hi Mvolz, our proposal is to build upon the Extension:Graphs, we describe it in section Grants:Project/Dataviz_4_Wiki#Specific_activities. The idea is to create a visual interface for creating the Extension:Graphs wikicode that can be then inserted in the page. Indeed the extension, being based on Vega, is quite powerful, however it's not simple to understand how to create visualisations with it.
More practically the goal is to make a tool similar to RAWGraphs that instead of exporting an SVG will export the wikicode. --Mimauri (talk) 20:22, 18 February 2020 (UTC)
A number of us are working on an effort to pull OurWorldinData stuff into Wikipedia. So far we have imported nearly all of their .svgs and a bunch of their .csv
I have written a 6 step process for manually converting .csv to .tab here and have been doing a number by hand.
A tool that allows us to do them at scale would be nice. User:Fæ tried to upload at scale by bot but many listed here did not work.[1]
Agree the big question is do we work on improving the data visualization tools we have or do we look at adding a different one. The OurWorldinData data visualizer is under an open license but based on React instead of Vue so from what I understand is not compatible.
We have a long list of improvements we want for the current data visualizer. Wondering if this team would be interested in working on that? Doc James (talk · contribs · email) 15:37, 18 February 2020 (UTC)
@Doc James: Of course we are willing to work on that! As Mimauri said in his answer to Mvolz we do not plan to build an entirely new system starting from scratch, but to improve what is already existing. --Niccolò "Jaqen" Caranti (OBC) (talk) 14:48, 19 February 2020 (UTC)
Would be useful to go through the improvements to the graphs extension you plan to make. I would love to see it work similarly to the graphing tool for Our World In Data. Doc James (talk · contribs · email) 19:48, 19 February 2020 (UTC)
We will take into account Our World In Data and other models, together with community input. --Niccolò "Jaqen" Caranti (OBC) (talk) 16:23, 21 February 2020 (UTC)
@Doc James: “for manually converting .csv to .tab here“ you know there is a gadget for that in the preferences ? —TheDJ (talkcontribs) 15:04, 19 February 2020 (UTC)
User:TheDJ thanks no did not know about it. Just turned it on and will test it out.
I see this[2] Is there a description of how to work it? Doc James (talk · contribs · email) 19:42, 19 February 2020 (UTC)
That's a nice tool (it is one of the thing we plan to build upon): it adds two export buttons at the bottom of every data page on Commons. Also, if you go to an empty data page (e.g.) it will allow you to upload a .csv or .xslx file importing it to .tab. The fact that the tool is hidden, not much know and (afaik) not described anywhere is one of the reason why imho the work we are planning with this project is needed. --Niccolò "Jaqen" Caranti (OBC) (talk) 20:56, 19 February 2020 (UTC)

Dependencies[edit]

Does the library have any dependencies (i.e. jquery, react, vue) or is it a standalone library? Mvolz (talk) 09:29, 18 February 2020 (UTC)

It won't be properly a library, rather a visual interface for simplifying the creation of graphs using the Extension:Graph and the .tab data format on commons. We haven't already defined the framework (could be React, or even something simpler) and surely we will use the Vega library --Mimauri (talk) 20:22, 18 February 2020 (UTC)

Provenance of datasets[edit]

What will guarantee the provenance of datasets under this scheme. (I download data from GBIF and when I do so, I am given a DOI to guarantee that the data I have graphed are a correct representation of the downloaded data. Provenance is vital. I do not see it dealt with in this proposal. MargaretRDonald (talk) 23:04, 18 February 2020 (UTC) Moved from grant page. --Niccolò "Jaqen" Caranti (OBC) (talk) 07:26, 19 February 2020 (UTC)

@MargaretRDonald: I agree with you that provenance of the data is vital. When you upload datasets in the Data namespace of Commons you have to indicate source (and license). See e.g. commons:Data:Ncei.noaa.gov/weather/New York City.tab at the bottom of the page. The source will also be indicated when a graph is created in a Wikipedia page using those data, thus allowing readers to check that the data do correspond. --Niccolò "Jaqen" Caranti (OBC) (talk) 14:59, 19 February 2020 (UTC)

"Writing manual about data usage and copyright"[edit]

Not sure why we need another manual about "copyright"? Doc James (talk · contribs · email) 19:45, 19 February 2020 (UTC)

Not a general manual about copyright, but something specific about database right (I've changed the text), in order to clarify which datasets can be uploaded to Commons and which cannot be. It does not seem to me there is much about that on Commons, but of course I may be missing something. --Niccolò "Jaqen" Caranti (OBC) (talk) 20:37, 19 February 2020 (UTC)

More details on tool to ease creating visualizations[edit]

I'd like to see more information about "To address the second need, we will design a new tool, based on the state of the art, to simplify the creation of data visualization using the built-in extensions of Mediawiki (e.g. Graph extension)." This is super vague. For starters, what type of tool are we talking about (Is it external? integrated into visual editor? A Lua module? Is it a gadget? MediaWiki extension? Totally separate web page?). Do we even know what types of problems users are facing? I'm aware of 2 other attempts to do something like this (The polestar tab in https://query.wikidata.org, the (rather simple) graph button in visual editor. I suppose also, there is the external tool to make vega graphs [3]). None of these things really seem to be used that much AFAIK. What will be different about your tool to ensure that it actually meets the needs of wikipedia editors? What type of visualizations will your tool support developing? For reference, the status quo is that on english wikipedia main namespace, 97% of all usages of graph extension (3746 out of 3869) come from either w:template:Graph:Street_map_with_marks, w:template:Graph:Chart, w:template:Graph:Map Bawolff (talk) 07:06, 2 March 2020 (UTC)

Good questions, I'll try to address them here.
What kind of tool?
An external tool, probably hosted on Toolforge based on a similar approach we used in RAWGraphs: through a visual interface, connect to the datasource in .tab format through link, map data fields to visual variables, make some visual choices, export the wikicode needed to render that chart using the Extension:Graph
Do we even know what types of problems users are facing? What will be different about your tool to ensure that it actually meets the needs of wikipedia editors?
We based the proposal on the issues we found trying to make charts using current solutions.
  • Polestar/Lyra: both of them have a visual interface, but are quite complex to learn and focused on Vega. The code they generate is not directly compatible with Extension:Graph since it must be re-incapsulate and not all the function of Vega AFAIK are supported by the extension.
  • Vega editor: it needs to know the Vega grammar, and it's not that simple to create even simple charts. Doesn't provide any graphical interface, and no support for .tab dataset.
  • Templates (e.g. w:template:Graph:Chart): don't support for the .tab format dataset, requires data encoding in (yet) another format.
What type of visualizations will your tool support developing?
We want to provide a simple to use tools that allows the creation of charts meant specifically for Wikipedia, that can be expandable, and that relies on the exiting extension. For now the goal is to provide the most common charts (bar charts, line charts) and maybe test some more advanced.
--Mimauri (talk) 17:30, 5 March 2020 (UTC)

Eligibility confirmed, Round 1 2020[edit]

IEG review.png
This Project Grants proposal is under review!

We've confirmed your proposal is eligible for Round 2 2020 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through March 16, 2020.

The Project Grant committee's formal review for round 1 2020 will occur March 17 - April 8, 2020. We ask that you refrain from making changes to your proposal during the committee review period, so we can be sure that all committee members are seeing the same version of the proposal. Grantees will be announced Friday, May 15, 2020. Any changes to the review calendar will be posted on the Round 1 2020 schedule.

Questions? Contact us at projectgrants (_AT_) wikimedia  · org.

I JethroBT (WMF) (talk) 18:43, 6 March 2020 (UTC)

Questions from User:T Cells[edit]

Hello Niccolò Caranti (OBC) and thanks for submitting this grant request. The idea of creating a new system that will easily allow contributors to use existing mediawiki technologies is great and something I considered valuable. However, I have a few questions around the project timeline and the role of Wikipedia liaison.

  1. What is your project's timeline?
  2. What is the duration of the developmental stage?
  3. In addition to liaising with the Wikimedia community, does the Wikipedia liaison has other functions?
  4. Could you please explain why the Wikipedia liaison should be a paid position since you already have a project manager and technical coordinator's position as a paid position?
  5. Rather than having a single person as a Wikipedia liaison, have you considered decentralizing their roles, maybe recruiting Wikipedia liaison for major language Wikipedias (en, de, fr, Spanish etc.)?
  6. What does the administrative fees covers?

Thank you. I look forward to reading your response. T CellsTalk 12:44, 20 March 2020 (UTC)

Hi T Cells, thank you for your questions!
  1. You can find the project timeline in the table in the activities section. We were planning to start in September, but this may change because of the disruption caused by COVID-19 pandemic.
  2. 7 months. We know that this is a rather long period of time: this is a conscious choice, because we want to have time to discuss in depth with the community during all the different phases of development.
  3. My function as the Wikipedia liaison will of course to be a bridge between the developing team and the community (not only with the Wikipedia one of course). Some other functions are detailed in the budget narrative section. As an example I will help writing manuals about data and tools usage, and I will write a manual about database copyright (I have a law degree and I did a master thesis in copyright).
  4. Sure. The liaison is a big part of this project so it will be too much work for a volunteer. I do not have the necessary experience to act as Technical Coordinator/Manager and Giuseppe, my co-grantee, does not have the necessary experience to act as Wikipedia liaison, so our roles have to be separate.
  5. As explained in the project, I will personale engage the English and Italian community. For a start this could be enough, but I also plan to involve translators in order to reach other language communities. Of course we would love to recruit volunteers liaisons for other languages.
  6. The fiscal sponsor (OBCT) will deal with the agreement with WMF, the Politecnico di Milano, me and Giuseppe, with all the payments and the travels (of course, only if they will be compatible with the ongoing pandemic). It will also help to organise the presentation of the tools with the journalists of the EDJNet project (which is coordinated by OBCT) and the other presentations. The guidelines permit Fiscal sponsor administrative fees of no more than 20% of the requested grant amount. We chose to ask just half of that (10%).
Please let me know if if you have any more questions! --Niccolò "Jaqen" Caranti (OBC) (talk) 15:14, 23 March 2020 (UTC)
Thanks for your response. Could you please update your project timeline with the proposed dates and duration of tasks? Regards. T CellsTalk 17:40, 23 March 2020 (UTC)
T Cells, I've updated the project timeline with the specification I've made above. Sorry for the delay! --Niccolò "Jaqen" Caranti (OBC) (talk) 14:45, 8 April 2020 (UTC)
There's no need to apologize. Thank you. T CellsTalk 22:05, 9 April 2020 (UTC)

COVID-19[edit]

I would like to signal that, considering that we are living in Italy, we are fully aware of the ongoing pandemic and of the measures taken by the WMF. Our offline activities should hopefully take place after the emergency is finished, but it is fully possible for us to move them online or renounce them if necessary, and the project would still be perfectly doable. --Niccolò "Jaqen" Caranti (OBC) (talk) 16:02, 23 March 2020 (UTC)

Uploading tabular data[edit]

Hi, maybe I did not read closely enough, so pardon me for this question for clarification: Commons supports uploading tabular data (in the "Data" namespace with a ".tab" suffix), however maybe not in formats that are mentioned in this proposal (csv and xslx), as the proposal says "a tool allowing for the upload of datasets to Commons from csv or xlsx files"? Is this about creating another "upload tool", or expanding an existing tool to support more file formats? --AKlapper (WMF) (talk) 15:06, 7 May 2020 (UTC)

Hi AKlapper, thanks for your question! Yes, we know Commons already supports .tab files, and in fact - as discussed above - there already is a gadget converting .csv or .xslx to .tab files. With regard to the uploading of datasets we have two main purpose:
  1. to also allow copy and paste interaction to insert data
  2. to make all these possibilities easier and more known
We plan to do that with gadgets and tools to be hosted on Toolforge, without involving WMF developers and “messing up” with core code.
Please let us know if you have other questions. Thank you! --Niccolò "Jaqen" Caranti (OBC) (talk) 12:09, 8 May 2020 (UTC)

Aggregated feedback from the committee for Dataviz 4 Wiki[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
6.8
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
6.0
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
7.3
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
5.5
Additional comments from the Committee:
  • Benefit to Wikimedia not clear because there are some tools that are doing the actual work (with quality issues).
  • I have read the discussion page and I have some concerns to know if this tool is new or if it replicates what already exists.
  • The project fits with Wikimedia's strategic priorities. The developed tool can continue to be used after the grant ends, although it will require some maintenance.
  • I see a lot of risks about community adoption outside the Italian community, because the veteran users (mostly years-users) in Wikimedia projects don't like new tools, and I see a lot of complexity about the participation model to upload a new graph.
  • The project is innovative as creation of such a tool has not been attempted before. The risks are relatively low as the tool will utilize two capabilities, which already exist: data namespace in Commons and Graphs extension. The measures of success are clear.
  • The project is unlikely to sustain impact over time. It's likely to suffer from lack of maintenance as it's unlikely for Commons volunteers to be involved with the maintenance.
  • The budget seems well defined and the skills and main idea are clear. There's not enough extra time in case something goes wrong.
  • The critical point here is that one good profile is volunteer while the grantees are a community manager and a developer. This is quite strange because it is a software development project and probably the planning and the software engineering is very critical more than community management.
  • The project can be accomplished in 12 months and the budget seems to be realistic and efficient.The participants appear to have necessary qualifications and skills.
  • I do think the team has the capacity to execute the project
  • The planned community engagement is quite extensive for development project. There is a community support.
  • There is little or no support or comments from software volunteers or the software team at WMF.
  • Maybe the next tool to create new graphs, but without a prototype and support of Wikimedia Tech, I see duplication of tool, and we don't need more fragmentation of tools in Wikimedia ecosystem.
  • Quite neutral until the overlapping with existing softwares will not be clarified. Some weaknesses about the structure of the team.
  • The goals of project are clearly articulated and look valuable for the Wikimedia movement. If successful, this project will empower Wikipedia editors to significantly improve graphical representation of the data in articles by making their work easier.
  • The budget appears too high. 880 hours is a lot for community liaison. Can this be reduced by at least half? Same for technical coordinator. I also feel that the role of the community liaison should be decentralized. I think the community liaison should be recruited for major languages.
  • As some aspect of the project involves Commons, there is need to seek feedback from the Commons community on whether they would like to be involved in the maintenance of the data and visualisation or not. I'd not be recommending funding for this proposal until the Commons community agree to help with the maintenance.
IEG IdeaLab review.png

Opportunity to respond to committee comments in the next 6 days

The Project Grants Committee has conducted a preliminary assessment of your proposal. Based on their initial review, a majority of committee reviewers have not recommended your proposal for funding. However, before the committee makes an official decision, they would like to provide you with an opportunity to respond to their comments.

Next steps:

  1. Aggregated committee comments from the committee are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal. We recommend that you review all the feedback carefully and post any responses or clarifications or questions on this talk page by 5pm UTC on Friday, May 15, 2020. If you make any revisions to your proposal based on committee feedback, we recommend that you also summarize the changes on your talkpage.
  2. The committee will review any additional feedback you post on your talkpage before making a final funding decision. A decision will be announced no later than May 29, 2020.


Questions? Contact us.


--Marti (WMF) (talk) 01:06, 10 May 2020 (UTC)

Thanks for your comments! We would like to better explain some of our choices.
Some components of the software we will integrate in our tools are already existing and will be maintained by others. The new software we will develop will be written simply and fully documented, following the open source best practices. Furthermore it will build on existing and actively developed platforms (vega.js). Therefore it will be easy, also for other developers, to maintain it. Part of my time as community liaison will be dedicated to involve volunteer users so they will be ready to help, if necessary.
Michele Mauri is a volunteer and not a grantee because as a civil servant it would be complicated for him (and for his university) to act as a grantee. This does not reduce in any way his commitment to the project. He will coordinate the development of the code which will be tackled by DensityDesign lab, of which Mauri is scientific director, as detailed in the budget and in other parts of the projects.
As some comments point out there are some risks in this project (as in similar projects): low adoption, especially outside Italian community, lack of maintenance after the end of the project, etc. Developers can write great software, but they need the input of the community, in order to write not just great software but the one the community wants, and they need for the software to be actively promoted, otherwise it will remain great but unknown: this is why the role of the community liaison is so important. The role of liaison between the software development and its testing among the users by Giuseppe, the Technical Coordinator, is also important.
We believe it makes sense to concentrate our energies on extensive testing with two language communities. Still, we would be enthusiastic to cooperate with further communities who want to be involved in the development and testing phases.
Of course we are available to review the budget, particularly regarding the internal distribution of the resources. --Niccolò "Jaqen" Caranti (OBC) (talk) 16:43, 14 May 2020 (UTC)


Round 1 2020 decision[edit]

IEG IdeaLab review.png

This project has not been selected for a Project Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding, but we hope you'll continue to engage in the program. Please drop by the IdeaLab to share and refine future ideas!

Comments regarding this decision:
We will not be funding your project this round. The Project Grants committee appreciates the value of better data visualization in the Wikimedia context, but Committee reviewers were reluctant to sort this project without greater consultation of the Commons community both about the need for this project as well as how it can be sustainably maintained after it is built.

Next steps:

  1. Visit the IdeaLab to continue developing this idea and share any new ideas you may have.
  2. Applicants whose proposals are declined are welcome to consider resubmitting your application again in a future round. We ask that you first email projectgrants(_AT_)wikimedia · org to indicate your interest in resubmission so staff can review any concerns with your proposal that contributed to a decline decision, and help you determine whether resubmission makes sense for your proposal.
  3. Check back at the schedule for information about the next open call to submit proposals.

Questions? Contact us.


-- On behalf of the Project Grants Committee, Morgan Jue (WMF) (talk) 19:32, 29 May 2020 (UTC)