Grants talk:Project/mySociety/EveryPolitician

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

User pages[edit]

I advise each of the MySociety reps listed as potential grantees to create their Wikidata user page (see tips here), for their own convenience; and to add to it a note about their work for MySociety, as required by the WMF policy on paid editing. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:30, 14 March 2017 (UTC)

In accordance with Paid editing and Conflict of interest editing principles, I should disclaim that I am a trustee of the registered charity that owns mySociety and a non-executive director of mySociety Limited. As well as mentioning it on my user page, I have listed myself as a connected contributor on en:Talk:mySociety (and similarly on en:Talk:Open Rights Group, as I am also a non-executive director there). In accordance with UK charities law, these are unpaid positions; in accordance with en:WP:COI, I avoid making any edits that could be construed as contentious on these or related articles (on any wiki, including Wikidata). For these reasons, while I hope our grant application is successful, I have avoided endorsing it directly. If anyone has any comments or concerns, I welcome discussion on my enwp Talk: page, or users may email meOwenBlacker (Talk) 11:14, 14 March 2017 (UTC)

Licensing[edit]

  • What's the license of the EveryPolitician code base? ChristianKl (talk) 10:13, 29 March 2017 (UTC)
@ChristianKl: MIT throughout, other than the code for the everypolitician.org website itself, which is AGPL. --Oravrattas (talk) 09:37, 3 April 2017 (UTC)

What is the outcome for the Wikimedia projects[edit]

Scanning through this it seems like fairly much is about activities outside WMF-projects, and then there is an "oh yes we will import data to Wikidata". Hard-coded screen scraping is very fragile, and I suspect this will create a lot of continued maintenance. That makes me believe the solution will not scale well, if it scale at all. — Jeblad 13:44, 1 April 2017 (UTC)

Thanks for the question Jeblad. The reality is it's actually quite the reverse. We've spent the last two years working on the EveryPolitician project externally and we've come to the conclusion that the best way to scale and increase the usage of the data and experience that we have gathered is to transfer this over to Wikidata. So the intent is to take our deep understanding of the structures and interrelationships of parliaments and politicians around the world and ensure that that is reflected to a much greater extent on Wikidata. We'll then support the community in populating this data into Wikidata and the new structures we've created there. As we already run around 1,000 scrapers we've got a very good idea as to the level of maintenance involved. The intent of this project is to shift the emphasis away from relying primarily on screen scraping to keep all of that data up to date, and instead enable the wikidata community in each country to keep the data up to date and accurate. Once we get beyond a critical mass in many countries, we expect that the community will keep the data up to date based on their own (usually local) knowledge, and so the scrapers become a support mechanism to highlight changes that people might have missed as opposed to the primary mechanism. The end result is that Wikidata will become the primary source of well structured political data and mySociety becomes a reuser of that data in our other democracy and parliamentary projects along with many other groups and individuals. --Markcridge (talk) 09:33, 3 April 2017 (UTC)

Eligibility confirmed, round 1 2017[edit]

IEG review.png

This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 1 2017 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through the end of 4 April 2017.

The committee's formal review for round 1 2017 begins on 5 April 2017, and grants will be announced 19 May. See the schedule for more details.

Questions? Contact us.

--Marti (WMF) (talk) 19:53, 27 March 2017 (UTC)

Dear Oravrattas, Markcridge, Chrismytton, ODenman, and Beholderstories,

Thank you for submitting this proposal. In my initial discussions about this proposal, I have the impression that there is both support for this project (as indicated in your strong endorsements so far, including from the Wikidata team) and confidence in your ability to execute it. I have one main point of feedback for you at this point: I encourage you to take the time to respond carefully to the comments you receive here on your talkpage. Among other steps in our review of grant proposals, we look at community engagement patterns, checking to see if applicants have reflected back the concerns of the people who have commented on their talkpage, demonstrating that they both (a) seek to understand what matters to the people who have provided feedback and (b) are seeking to address it in some way (even if it is by taking care to explain why you think differently). Essentially, we want to see that applicants are prepared to engage responsively in our highly open and collaborative volunteer communities. Any questions or concerns you receive are likely to be shared by others, so it is an opportunity for you to further clarify and improve your proposal. In this specific case, the issues that have been addressed so far--about COI, an a prioritization of the volunteer ethos, open source licensing, scaling and relevance to the Wikimedia projects--are all very common concerns in our communities, so I encourage you to respond with as much clarity and transparency as possible.

As noted above, committee review will begin on April 5!

Kind regards,

--Marti (WMF) (talk) 02:06, 3 April 2017 (UTC)

Discussion on Wikidata project page[edit]

There was some discussion of this at Wikidata:Wikidata:Project_chat/Archive/2017/03#Pay_website_to_make_use_of_Wikidata_.3F.

While the organization is likely to have merits in terms of curating data in the field they apply for, from a Wikidata point of view, it's not clear why we should pay an organization to make use of Wikidata in this particular field. It's already fairly well covered by Wikipedia and specific Wikidata WikiProjects. It's not entirely clear if their approach to sourcing is in line with the Wikidata's. Besides, data in this field can obviously gain from local insight, a recent initiative on Finland can illustrate this.

There is an obvious need at Wikidata for improved import and editing tools, but this isn't specific to this field and it's not clear why WMF should fund proprietary tools of other organizations to edit Wikidata in their field. If such tools should be developed, they should be hosted at Wikidata or on the toolserver and not limited to specific fields or third party organizations for their editing. --Jura1 (talk) 03:19, 6 April 2017 (UTC)

Hi @Jura1: - thanks for raising this again. We had linked to the that discussion on our project grant, but it looks like it's been archived.
I guess where we need help on this is how best to express what we expect the outcome should be and what it's appropriate for Wikimedia to fund or not.
Our absolute intent for this proposal is that we should be able to transfer the expertise and knowledge we have built up over the past two years in assembling EveryPolitician over to Wikidata where we think it will be both more useful and have a greater chance of being used more widely by more people and other projects.
Categorically this is not about us seeking funding for the ongoing the development of EveryPolitician.org - where the proposal doesn't fully reflect that fact we'll go through and make some amendments to bring it in line with that intent.
The simple fact as we've outlined in the proposal is that whilst there are pockets of good political data on Wikidata it is far from consistent or widespread. Over and above the actual names and details on politicians the real value of what we can help with is the relationships between legislatures, elected positions, politicians and so on that are so important for making use of the data.
The reason we're looking for funding is to help us transition all of the effort we've put in to date and fully make this part of Wikidata, and also to do support community efforts in key countries so that the local knowledge that we both agree is so valuable will have more chance of being sourced and made use of. I'll do a pass through the proposal today, but if you think there are more areas that would benefit from being edited to reflect this intent please do let me know. --Markcridge (talk) 07:00, 6 April 2017 (UTC)

Comments from Ruslik0[edit]

Appreciate your efforts but I have some comments questions:

  1. I do not completely understand how the data will be imported into Wikidata. The proposal says it will be done by volunteers. However I am not sure how viable this endeavor is. Volunteers may not be available or may not want to use EveryPolitician as a source.
  2. I am not sure that in long run Wikidata can be serve as a reliable back-end for EveryPolitician. Anybody can edit it Wikidata and use different sources.
  3. I do not understand how your scrapers will maintain the data in the future. Will they function like bots and add the data directly?
  4. Since this project requires significant Wikidata community involvement, you are better to begin with a local discussion. I have some doubts that the Wikidata community is able and willing to provide the extensive support that you need.
  5. You should spell out the community events that you want to hold. How many, where, when, there nature etc.

Ruslik (talk) 16:57, 8 April 2017 (UTC)

Hello @Ruslik0:, thanks for the questions, you highlight lots of things that we are thinking deeply about as well. I'll take each of the questions in order although some of the issues are linked together. So apologies for lengthy answers:
1. There are broadly two parts to this.
  • Firstly the model that describes the relationships within each country between politicians, parliaments, the government, cabinet posts, parties / factions etc.
  • Secondly the actual details of each of these things and the politicians themselves.
In the first instance mySociety staff will be able to create these models and in many cases actually populate data on Wikidata itself. On the second that where local knowledge is most useful and where volunteers in each country would keep this up to date.
The current EveryPolitician project at it's simplest is scraping together data from multiple official sources and representing this in a structured and consistent way - so it is a consolidated view of 'official data' and as such we can provide reports that highlight when and where official data has changed from a single source which should be of use for volunteers in a country in ensuring Wikidata remains consistent, but it doesn't preclude them from adding all sorts of additional detail and data themselves. Especially as EveryPolitician data is almost entirely restricted to national level politicians currently.
One of the most important things is that we'll use this project to bring more people from the Civic Technology and Parliamentary Monitoring world's in to the Wikidata community – so it is not just about expecting current Wikidata contributors to take on an additional burden, rather we'll use this process to bring more people in to the community itself. mySociety embracing Wikidata to a greater extent is hopefully testament to that.
2. Our main priority is to help establish better structures in a more consistent way for recording political data on Wikidata as it will be used more widely + will be able to record much greater levels of detail, especially beyond the national level. That does come 'at a price' in as much as it relies on strong volunteer efforts in each country. There is a balance to be struck here but we believe the best way to make this work sustainable in the long run is to embrace this approach.
In the first part of this transition we're intentionally only going to focus on 30 to 40 countries once we've properly identified local volunteers and support that give it the best chance of success. As to re-importing data back in to EveryPolitician, we can regulate the changes as they come back in to EveryPolitician and assess them for obvious issues before being updated – this is something that we're clearly going to need to evolve and iterate as we make progress.
3. As above the scrapers primary job will be to highlight when official data has changed in a consistent way. Data won't be added directly to Wikidata, it will need to be manually updated. There may be exceptions to this but that's out understanding – if there are useful precedents that we should be looking at where this is not the case we'd be interested in hearing about them?
4. Our starting point will be a combination of modelling work that we can do as mySociety staff from our knowledge of EveryPolitician and in parallel we're already assessing which countries already have decent data and which have active volunteers available in this area.
This grant process is part of identifying those people as well as expanding upon the contacts that we already have.
I suspect it will be quite lumpy to begin with, we'll aim for a handful of countries to begin with and use these as pilots and demos and expand as we go. So this is a constant ongoing part of the project not just a one-off.
5. The Events we run will partly be determined by what's needed in each country. mySociety staff are based in the UK, and we've run or participated in many events both in person 'edit-a-thons' or their online equivalent. Our collaborators in Democracy Club have been tackling exactly these type of tasks in crowdsourcing large amounts of election candidate data ahead of each election.
Broadly the events in the UK will focus on the models and supporting other countries to get started with that first bulk update of data and facts. The specific country events will target that first wave of 'most likely to succeed' countries to get the first bulk data update complete - from there we rely on the community and volunteers to keep the data up to date.
We're also working on the assumption that coordinating events around an election period is by far the best way to harness enthusiasm and existing activity – we'll amend the proposal to better reflect this through this week.
With all of this it is a work in progress and we'll be adapting, learning and changing the type of support and help we aim to give for each country as we get a better idea about what works and what's required.
We have no doubt as to how complicated or challenging this is, but for us we see no better alternative for using what we've learned through EveryPolitician to aid the creation of more accurate and consistent political data on Wikidata.
--Markcridge (talk) 16:38, 9 April 2017 (UTC)
@Ruslik0: a few extra comments on top of those from Markcridge:
It is our understanding that the majority of data we currently have in EveryPolitician cannot simply be added in bulk to Wikidata, particularly for EU countries, or other jurisdictions with database rights. In places where political data is explicitly in the public domain, but has not yet been imported into Wikidata, we will be happy to write bots or use existing tools to add that information, and in many cases we will have already reconciled the people, parties, constituencies etc in question to their Wikidata items, making that task substantially easier.
However, we assume that in the majority of countries the more useful task is to produce reports that compare the data that is already in Wikidata with the information available from other sources — e.g. highlighting the six current members of the Parliament of Singapore who don't currently have Wikidata items. As we already run scrapers for most national legislatures (or work with local parliamentary monitoring groups who do), we can also help point out changes that may have been overlooked. In about 5-10 countries currently, there seem to be enough Wikidata users who actively keep this information up-to-date. We believe we can help increase that number to around 30-40, by drawing attention to errors and omissions that people will be likely to correct if they know about them, but could easily be unaware of otherwise.
We also see huge value in making all this data much more consistent, by helping to document (and participate in, from our experience of many of these issues from outside Wikidata) the evolving community norms and best practices for modelling and entry of all sorts of political data. At the moment the data is of wildly variable quality and consistency, making it essentially impossible to query it in a uniform way. Much of the P39 position information entered on Cabinet Members worldwide, for example, was added by a bot running against English Wikipedia, where often an infobox will say that someone is, say, Minster of Education, but link to the Ministry of Education (as there is no specific page for the position).
Many of these sorts of errors currently show up on the daily Database reports as Constraint Violations, but these are largely walls of impenetrable codes that requires extreme levels of dedication to work through. Producing per-country reports that list all the known cabinet positions and holders within each government period, for example, makes it much more obvious where things are missing, overlap oddly, are assigned to the wrong concepts, etc., in a way that make it much more likely that people with local knowledge will fix the problems.
Similarly, data on national elections is very confusing in many countries, as the Wikipedia pages in different languages are often about subtly different concepts — e.g. Spanish Wikipedia might have separate pages for the Presidential Election than the Legislative Election that happened simultaneously, whereas the English Wikipedia might only have a single article about the combined General Election. Often these all get merged into a single Wikidata item that not only has a weird mismatch of concepts when you're viewing or using it directly, but makes it very difficult to query in the aggregate. Or see how things like there being at least 4 or 5 different ways of expressing what political parties and factions politicians are part of make many types of queries against legislative information needlessly difficult.
The nature of Wikidata means that most of these things will always be issues, and it's usually better to for someone with knowledge to simply enter the data they have even if it's not perfectly modelled etc. But a lot of this could be improved greatly through a mix of
  1. producing or extending the documentation of how to enter certain types of information (primarily in a use-case driven manner, such as how to model the Parliamentary Committees and their Memberships in your country, rather than a Property-driven guide to what the constraints on P39 or P463 etc are).
  2. providing many ways of viewing data in aggregate form, rather than on an item by item basis, so that problems stand out more clearly. (For fairly simplistic examples, see https://www.wikidata.org/wiki/Wikidata:EveryPolitician/Report:P1313 or https://www.wikidata.org/wiki/Wikidata:EveryPolitician/Finland/Report:P768 — most reports would need to be more complex and robust than these, but even these show how easy it is for data to be mis-entered or omitted)
  3. helping migrate existing data and add new data in more consistent ways, so that people who enter data by mainly copying how it's entered on another item have better examples to work from.
Having spent the last two years doing exactly this sort of thing on EveryPolitician, we think we can add lots of value by doing lots of similar work within Wikidata directly. Wikidata has now reached the point where the amount of political data being entered and maintained (by humans, rather than simply by mass-copying from Wikidata infoboxes) is sufficient (but currently happening in a very ad-hoc and inconsistent manner) for us to believe that the timing is right to help drive this forward significantly through a burst of concentrated activity over a subset of countries, in the hope that those achieve enough critical mass to become self-sustaining, whilst also acting as an exemplar for other countries to start to copy.
--Oravrattas (talk) 11:24, 10 April 2017 (UTC)
Can you also clearly specify duration of the project? 4 or 6 months? Ruslik (talk) 11:10, 12 April 2017 (UTC)
@Ruslik0: sorry for the confusion there. The Grant would be for a 4 month period, but we would also be doing additional self-funded work around that. I've simplified the proposal by making it four months throughout. --Oravrattas (talk) 10:11, 13 April 2017 (UTC)

Comments of Glrx[edit]

Last night I poked around and was surprised that freshman Senator Kamala Harris (Q10853588) did not have her new position. I added it, which bumped the number of senators from 1963 to 1964. Many senators do not have qualifiers such as replaces, replaced by, electoral district, start time, and stop time. Only 33 senators have their electoral district specified.

Here's a SPARQL query that does some double counting and miscounting (I'm doing the ?stmt backwards) but shows absence of dates and electoral district.

SELECT ?item ?itemLabel ?sexLabel ?partyLabel ?districtLabel ?startdate ?enddate ?successorLabel WHERE {
  ?item wdt:P39 wd:Q13217683 . # position held US Senator (1964)
  
  # get statement to look at qualifiers (don't worry about double counts right now)
  ?item p:P39 ?stmt . #get the (a) statement
  ?stmt ps:P39 wd:Q13217683 . # position held senator - if senator twice, double counts
  
  # look at different qualifiers for coverage
  # FILTER NOT EXISTS {?stmt pq:P582 ?enddate } . # no end date -> 1647
  # FILTER NOT EXISTS {?stmt pq:P1366 ?successor } . # not yet replaced by -> 1702
  # FILTER EXISTS {?item wdt:P106 wd:Q82955} . # occupation politician
  
  # check for fields
  OPTIONAL { ?item wdt:P21 ?sex } . # total? coverage
  OPTIONAL { ?item wdt:P102 ?party } . # good coverage / picks up multiple parties (eg Aaron Burr) (2182)
  OPTIONAL { ?stmt pq:P768 ?district } . # only 33 electoral districts specified
  OPTIONAL { ?stmt pq:P580  ?startdate } .
  OPTIONAL { ?stmt pq:P582 ?enddate } .
  OPTIONAL { ?stmt pq:P1366 ?successor } . # only 264 successors
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
}

try SPARQL query

It was also surprising that en:List of governors of Alabama (a list that has other inconsistent data) had been updated due to recent resignation, but Robert J. Bentley (Q552353) and Kay Ivey (Q6380211) had not.

Want some nice trash? Check out the fictional human Arnold Vinick (Q4795290) who looks like a real member of the Senate. Glrx (talk) 20:13, 11 April 2017 (UTC)

Glrx (talk) 16:42, 11 April 2017 (UTC)

@Glrx: — yes, and this is for the US, where we would have expected the quality of data to be amongst the highest. In all but a handful of places the situation is currently a lot worse than this — not even just in terms of missing qualifiers and fields, but in terms of entire concepts simply being missing. I estimate that probably only 50% of current state- or national-level politicians worldwide even have Wikidata items at all (I've been surprised at the number of even cabinet-level officials who don't in many countries), and significant numbers of those don't have any P39 information tying them to their position, to even be able to reach the level of the US Senate data here.
Beyond that it tails off even faster — it's not just that the electoral district isn't set on the membership, but there isn't even an item for it yet, etc. And unfortunately that sort of thing is a significant barrier to entry to many Wikidata users: creating a new item for a district and making sure you set all the correct fields on that seems like a much harder task than simply filling in one that already exists on the person's page. In many cases editors seem much more likely to pick something that seems sufficiently similar (which can look right on the person page, but mess up queries as it's not the right type of thing), or else give up altogether. So there's a lot of basic work that needs to be put in place first in most countries to simply ensure that all the electoral districts, parties / groups / factions, elections, legislative terms, etc, even exist, and are all consistently enough modelled so they can be queried.
Our hope is that a lot of that can be helped by having lots of queries and reports, just like this one, that expose obvious gaps for people to fill in (and good documentation of how best to do that), as the problems usually aren't immediately apparent until you try to write these sorts of queries, or do something with the data in aggregate. Where possible we can also look at finding ways of filling in lots of this information through bots, but my suspicion is that that would be less valuable. Do you have other suggestions for how to improve this situation? There is a huge need for this sort of information to exist in a well-structured way, so that people can build tools for holding politicians accountable worldwide. We've spent over a decade working in this field, and time and time again we've seen groups start out to build useful tools, but spend so much of their time and energy simply gathering the data they need, that they either give up or end up creating something much much less than they originally wanted. We've spent a couple of years starting to close that gap, but there's still a long way to go.
(NB: I also learned very quickly the importance of always having an 'instance of: human' on all my political queries!— e.g. there are almost as many fictional Presidents of the United States in Wikidata as real ones: query.wikidata.org)
--Oravrattas (talk) 09:46, 12 April 2017 (UTC)
These are just notes for me.
The query above is a bit daunting for any editor. Who wants to fill in a huge empty table? Working on a few empty holes is one thing, but the whole table is another.
I'm also curious at what sort of queries are anticipated or desired. Yes, many unexpected ones will arise, but I like efforts with specific goals.
I'm a bit lost about what's going on and what can be done. The proposal emphasizes structure rather than data. I've read it several times but don't have a good handle on it.
The initial goal of my query was a list of current senators. It failed miserably with thousands of "current" (i.e., not yet replaced) officeholders. The Senate only has 100 members.
Looking at the officeholder wikidata showed a query that could look at P102 party ratios over time and get confused. Hillary Clinton was a Republican before 1968; she's now a Democrat; Hillary Clinton (Q6294) shows start and end times on those affiliations, but many other Qitems such as John Conness (Q1351096) don't have start and end dates for party affiliation.
I followed some Senator successions and filled them out, but there were many individuals who held the office in disjoint times. Inserting that data is not a trivial task; I'm not sure the average editor would figure it out. I created two position held entries for a few such disjoint holders. Willie Person Mangum (Q1720552) is an example with two positions held. David Stone (Q889824) is an example I haven't fixed yet. It is also odd because the infobox data at en:David Stone (politician) calls out the two intervals differently from other infoboxes that I encountered. Dual entries were used (by some other editor) for Jerry Brown (Q152451) disjoint gubernatorial terms.
The Senator data is buried in text at en.WP but mostly missing in WD. Sadly, the information does not appear that easy to extract (scraper is an apt description). The term information is buried in the {{Infobox Senator}}; there's also succession. The template {{U.S. Senator box}} also has succession information. A bot could easily handle the simple cases, but the variations make it messy. en:Robert F. Kennedy doesn't follow the norm.
The data should not be duplicated at en.WP and WD. The infoboxes should pull the information from WD, but the information isn't there yet.
Use API to get template transclusions of data-laden infobox.
For each WP article
Extract term or successor information such as Smith preceded Jones
An en.WL can be converted to a Q-item with wbgetentities; that gives both the Q-item and its statements.
Smith = Q101, Jones = Q102
Then verify/augment that relationship in wikibase.
The ugliness is Mediawiki markup isn't structured.
Perhaps an enddate could be set for US offices. For example, Trump's current term will expire 20 January 2021. That would allow "empty" offices to be found automatically. It does not work in non-fixed term posts (such as Supreme Court Justices).
Fast list material: {{Governors of North Carolina}}.
Glrx (talk) 01:13, 13 April 2017 (UTC)
@Glrx: thanks for the thoughts. There a lot of things to consider there — apologies if I've missed anything:
  • I agree that a huge empty table can be off-putting — that's one of the reasons why we want to start with countries that already have a lot of data, rather than the ones where there is very little information. And, even within those, we would focus first on the most achievable tasks. That said, it's not always necessary for a single editor to take on such daunting tasks alone: we can encourage people to fill in the data relating to their own state's senators, for example. And one of our plans is to run edit-a-thons specifically around tasks like this. (Though the US is probably a bit of a red-herring as most of the data is explicitly in the public domain and could be added by a bot.)
  • In terms of what sorts of queries/reports should exist, I'm initially imaging these to fall into a few types.
  1. The first would be largely about structure and modelling, rather than specifically about people: for example https://www.wikidata.org/wiki/Wikidata:EveryPolitician/Report:P194 or https://www.wikidata.org/wiki/Wikidata:EveryPolitician/Report:P1313 where you can see at a glance whether these fields are filled in (and correct) for each country. (These are very early versions of these — a fuller version would also query that sort of information in reverse, so to speak, finding items with the correct 'instance of' and 'jurisdiction' set, and showing those side-by-side, to make sure that the data is accessible in either way, and is the same. However, even this basic version is very useful for spotting gaps and errors.) Reports like this would also help unravel the issue I raised a couple of months ago about the distinction between the government and the cabinet in a country. These would then dig further down into some of these concepts, often on a per-country basis. What posts are part of the Cabinet, in each country, and are the relationships between the Minister and the Ministry, cross-mapped? Are all the items set as 'electoral district' on positions of a consistent type (e.g. https://www.wikidata.org/wiki/Wikidata:EveryPolitician/Finland/Report:P768)? Are there unexpected overlaps or gaps in the list of legislative terms? Do items for elections have surprising characteristics? (it's very common for a single Item to represent both a parliamentary and presidential election at the moment — a query for "elections where the president is set as the winner of an election that is not a presidential election", for example, could help uncover items that should be split up.)
  2. Then there are all the reports that are more akin to the things you were exploring in your query above: are there "position held" statements that are missing or erroneous? (e.g. more than N members simultaneously for a given district / position). Which ones don't yet have certain fields filled in (dates and/or terms, electoral districts, successor/predecessor, etc)? Which fields do we expect to be filled in for the people themselves (gender, occupation, identifiers from the relevant legislative site, etc? https://www.wikidata.org/wiki/Wikidata:EveryPolitician#Outline_properties is still quite basic, and doesn't yet include anything to do with Cabinet positions, Elections, Committee Memberships, or lots of other things. But essentially, for each 'rule' expressed there, there could (and eventually should) be at least one query/report per legislature/country/body. Which of those are actually most useful, especially initially, remains to be seen, and how these are actually presented to avoid being completely daunting in countries or areas where there are huge gaps, but where issues can be seen at a glance in places where the data is believed to be almost complete, without clicking through to 20 separate reports, is still an open question (though I have some ideas about that). But even having these at all should be hugely beneficial in many places.
  3. The third type of report will be those created from information outside Wikidata, to draw attention to things that may have been overlooked within it. Our project already scrapes almost every national parliament in the world every day, and we have cross-linked many of the members of those legislatures to Wikidata (and will continue to do). This allows us to produce reports that highlight, for example, that three new members started yesterday who do not yet appear to have been created in Wikidata.
  • I'm hoping that these examples might also help explain a little more the distinction we're making between structure and data. Obviously in Wikidata, almost everything is data, so it's a slightly artificial distinction, but I think there's a difference between being able to say that Sarah Olney, as a Liberal Democrat, became the new Member of Parliament for Richmond Park during the 56th United Kingdom Parliament, on 1 December 2016, replacing Zac Goldsmith, after the Richmond Park by-election, 2016 (data), vs, say, making sure that "Richmond Park" is correctly modelled with the correct type and jurisdiction, that the parliamentary term exists and is connected properly to the legislative body and links to the various Category or "List of Members…" pages, etc — i.e. all the things that need to be in place to make it easy to add the data in the first place (without having to keep breaking to create pages for the individual concepts first — a task that is very off-putting for many people, particularly new editors), and also to be able to write useful queries against the data. What we're emphasising here is largely just that we're not going to be spending all our time manually entering data ourselves — our work will be focussed on improving the ability of the Wikidata community to keep all this information up-to-date.
  • In general I'm a little wary of importing much of this data from Wikidata via bots. Much of the low-hanging fruit there has already been plucked, and even that has caused a lot of problems: e.g. adding cabinet positions from inboxes had led to lots of positions being set such that someone was the "Minstry of Defence" or "List of Ministers of Defence" rather than the "Minister of Defence", due to the Wikipedia being scraped (usually English) not having specific pages for those posts. Where this information can be added by bots, then I think it would usually be better to use primary sources instead (and thus be able to also include references). I do agree, though that Wikipedias are unlikely to start pulling this sort of data in from Wikidata (and then in turn helping maintain it) until it reaches a critical threshold that, unfortunately, we're still very far short of in most countries. Our proposal is all about helping close that gap.
--Oravrattas (talk) 11:43, 13 April 2017 (UTC)

Comments from Lucyfediachambers[edit]

Hi All,

Thanks for the great discussion on this page. I have today updated the sections for the problem statement and solution. This follows the questions raised on this page and a conversation with @Mjohnson (WMF): on areas of the proposal which still required further clarification.

Please note that while the changes may appear extensive, they do not represent any change in plan in our minds and are intended purely as clarifications. Happy to answer any further questions which may arise from any changes made! --Lucyfediachambers (talk) 16:48, 26 May 2017 (UTC)

Round 1 2017 decision[edit]

IEG IdeaLab review.png

Congratulations! Your proposal has been selected for a Project Grant.

The committee has recommended this proposal and WMF has approved funding for the full amount of your request, £40,000 GBP

Comments regarding this decision:
Consultations on this proposal provided glowing feedback about the applicant team and the project. We were impressed during the interview by your understanding of working with volunteer communities and your thoughtful plans to design workflows that are responsive to our communities' existing needs and procedures. We appreciate that you are establishing infrastructure to sustain impact over time and we are glad to support your efforts in doing so.


Next steps:

  1. You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
  2. Review the information for grantees.
  3. Use the new buttons on your original proposal to create your project pages.
  4. Start work on your project!

Questions? Contact us.


Aggregated feedback from the committee for EveryPolitician[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
5.7
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
6.4
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
6.6
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
5.4
Additional comments from the Committee:
  • It's unclear to me if this project will also import data or if the aim is only the development of a connector?
  • The potential impact is large but I have concerns about its sustainability. Of course, MySociety will continue to use Wikidata database, which will make it more likely to be sustainable but still doubts remain.
  • It has some potential for making a big impact, nonetheless I have reservations about the sustainability and scalability of this project.
  • Pretty good impact potential, some concerns about sustainability.
  • This proposal is both iterative and innovative. The success can be easily measured but I think there are non-insignificant risks such that it will not pan out as expected.
  • This project has realistic goals and clear objectives as well as a good way to measure the success.
  • Pretty good approach, rather low risk.
  • There are no major problems here but they probably need to clarify which community events they will hold.
  • Numerous team capable of doing the job backed by external partners.
  • Very likely to succeed, experienced team will be definitely able to do it well.
  • It has a specific community - Wikidata and its support. Yes it definitely supports diversity by providing data about many countries.
  • It is not clear how they will execute thevents and activities, especially those outside the UK.
  • Good engagement with the UK and Wikidata communities, would like to see more engagement outside the UK, however.
  • Please define clearly the impact and the deliveries. In my opinion the cost is to high to have only an automated import in Wikidata.
  • Glad to see they have released the source for their scrapers (here) and explain how to write one and integrate it to their system (here), but disappointed that they didn't link to this information in their application (unless I missed it).
  • Interaction on their discussion page is helpful.
  • Good to see that they aren't just looking for funding for their org, but are focusing on how they can use what they know to help WikiData.
  • Glad to see them doing a significant amount of self-funding.
  • I want to recommend it cautiously for funding but the progress should be closely monitored and the funding continued only if there is a reasonable prospect that they will achieve their goals.
  • Not my highest priority but I would go for it.

Questions about WMF funds/grant use[edit]

As we have EveryPolitician staff complaining about the usability of some of the toolserver-tools for the datastructure we are currently using, but apparently not suitable for EveryPolitician, there is a question about what WMF funds are being granted and used for in relation to the project and who is actually being paid with it: d:Wikidata_talk:WikiProject_Parliaments#What will EveryPolitician provide us with?. --Jura1 (talk) 16:03, 5 September 2017 (UTC)