Grants talk:IdeaLab/Tools for using wikidata items as citations

From Meta, a Wikimedia project coordination wiki

Can this be done? Is it time?[edit]

This proposal as stated is something which has to be done eventually.

It has been a perennial proposal for about 10 years, with the change being now that it seems Wikidata will eventually be the solution for enacting it.

It would be super-powerful and super-useful if somehow, citations in Wikimedia projects could be connected to a central database, probably in Wikidata, and then the citations could be managed there.

I have a list of past related proposals at Grants:IdeaLab/Reform of citation structure for all Wikimedia projects. There are hundreds of pages of commentary on this elsewhere, but we still do not have any development team ready to push this on Wikidata.

Can you do this? Blue Rasberry (talk) 19:28, 26 September 2014 (UTC)[reply]

Feedback[edit]

This is a really exciting and useful idea. I can easily envision Wikidata as a repository for structured citations and imagine how Wikidata in this role could be used not just to extend academic scholarship practices but to also contribute to the field of alt-metrics.

As User:Bluerasberry points out, there are quite a few proposals that are similar and related to this one. This is not a bad thing! To me, it means more potential partners (for example, the VisualEditor team, the Wikipedia Library project, etc.) that may be interested in advising on the project or contributing to it in other ways. I too would like to know more about the feasibility of the project if, for example, the Wikidata development team does not have capacity to support it, and would also like to see some discussion of use cases outside of just the Wikipedias.

While not quite the same thing as planning and coordinating Wikidata deployment on sister projects, I think the various proposals linked from d:Wikidata:Sister_projects would be worth looking at to see how previous efforts and discussion about different content types (such as Commons' media files and Wikiquote's quotations) could be adapted for the purpose of modelling citations on Wikidata. Perhaps a good aim of this proposed grant project would be primarily research and coordination with various stakeholders, with the eventual aim of renewed funding for deployment and development as a next phase?

-Thepwnco (talk) 22:49, 29 September 2014 (UTC)[reply]

Hi @Thepwnco:! I'd absolutely love to get more input from Wikidata people. I've posted this on Help:Wikidata and WikiProject_Source_MetaData got a few responses there but not as much as I'd have liked. I'm fairly familiar with the VisualEditor team since I did OPW this summer doing this citation related project with them. [1]
Thanks for linking to d:Wikidata:Sister_projects. I found one mention there [2] that linked to a recent discussion proposing the same thing.[3] It seems like this idea has been floating around for a while! It seems to me from reading the docs on Wikidata sister projects there and the linked discussion that the idea has been so far, "wikidata is on en wiki, now it's up to the community to make the tools to use it".

Mvolz (talk) 08:57, 10 October 2014 (UTC)[reply]

Citation source[edit]

The idea is very good but be careful to the current structure of a reference in wikidata: a reference can be described by several items. We have to be clear about this description in order to avoid any misunderstood. I think we should refer to the page describing how to store citation data in WD (d:Help:Sources) and eventually discuss the need of modifying something there. — The preceding unsigned comment was added by Snipre (talk) 07:47, 30 September 2014 (UTC)[reply]

@Snipre: I don't quite understand this comment; could you clarify your concerns? So, on Wikidata, there is already a way to add a reference for a particular claim, by using another wikidata item as the source. I agree that (d:Help:Sources) has well-defined guidelines for adding Wikidata items as sources, so thank you for mentioning it! The goal of this proposal is basically to do the same thing that wikidata already does for adding references to claims made on wikidata, but do it on en wiki for statements made on en wiki. On en wiki there is currently no way to add wikidata sources to a citation aside manually linking to it from a citation (which I have never seen done). On wikidata, any given claim could have multiple wikidata items as references, just as any given claim on en wiki could have multiple citations, i.e. [1][2]
  1. Source 1
  2. Source 2

Efficiency issues[edit]

While I really like the idea, it may be necessary to point out right away that using Wikidata directly poses effiency challenges. It requires loading at least one item per source cited, and from the few rough tests that I have done, loading 200 items takes about 10 seconds, regardless of what we do with them. --Zolo (talk) 18:47, 30 September 2014 (UTC)[reply]

Zolo I am ignorant of whatever you are describing. How is this a problem? Blue Rasberry (talk) 19:29, 30 September 2014 (UTC)[reply]
@Bluerasberry: if you want to use Wikidata for citations directly, like in d:Template:Cite item, you should normally load the item in Lua (using mw.wikibase.getEntityObject) in order to get the data. There are two issues with that:
  • It it not yet possible in Wikipedia (precisely because of performance concerns). This will certainly happen (see Bugzilla:47930) but I am not sure anyone knows when.
@Zolo: I believe that it is in fact possible to do this in a Lua template, see Bugzilla:67538, which was fixed in July 2014, and the associated doc page for the wikidata lua extension, which indicates this is possible, and there's a module on ru wiki that uses some of these functions to cite things in info boxes, w:ru:Module:Sources. As I understand it, is is not yet possible to do this within the templating language (i.e. not lua), which is why Bugzilla:47930 is still not resolved. But, maybe @Lydia Pintscher (WMDE): or @Tobias Gritschacher (WMDE): could give us a more definitive answer? Mvolz (talk) 10:33, 10 October 2014 (UTC)[reply]
@Mvolz:. No, Bugzilla:67538 only work on Wikidata. That means it is possible to access arbitrary items on wikidata.org but trying do that that in Wikipedia returns an error ("Access to arbitrary items has been disabled"). --Zolo (talk) 13:11, 24 October 2014 (UTC)[reply]
Ah, thanks for clearing that up. Mvolz (talk) 15:27, 24 October 2014 (UTC)[reply]
  • Loading an item has some cost. If there are 100 sources to cite in an article, that would be 100 items, and something like 5 seconds of additional loading time for the (non-cached) page. For even longer pages, that may result in timeout errors. This may be acceptable, or there may be easy ways to improve the performance, I just want to be sure we don't come up with something that works but turns out to heavy to be deployed. --Zolo (talk) 07:33, 1 October 2014 (UTC)[reply]
Performance is definitely going to be an issue. Keep in mind though that we have to make the tools first before we fix performance! If this is a project that gets a lot of community traction, then that would be an impetus for wd to improve the performance of item retrieval. At first this would be a few new citations here and there, and not every citation on every page from the outset, which wouldn't impose significant cost. But definitely something to keep on eye on. Mvolz (talk) 10:33, 10 October 2014 (UTC)[reply]
Zolo What is your best guess for how many citations a page could present using this system without increasing load time by 1 second? Supposing that this project went forward as a trial, and it applied this citation structure to a maximum of 10 citations on some pages and more commonly an average of 3 on others. Would planning a trial on that scale be likely to reduce harm? Like Mvolz, I wish that if we could confirm that this is viable at a small scale then someone could authoritatively describe what it would mean to do this at a larger scale. Blue Rasberry (talk) 13:20, 10 October 2014 (UTC)[reply]
The VisualEditor team uses w:Barack Obama as a metric to test load times. It has 370 citations. I think that's a good worse case scenario that we could benchmark with as well :) Mvolz (talk) 16:14, 10 October 2014 (UTC)[reply]
@Bluerasberry:. Loading 100 items takes about 5 seconds. So assuming that you just load 1 item per template, that the time needed to process the item once loaded is negligible, and that everything else takes the same amount of time as without Wikidata. 20 wikidata-based citations would add about 1 second.
But actually, I have just reaslized that I may have been way overoptimistic, in assuming the we only need to load 1 item per template. If you only load one item, what you get: author = d:Q937, title = title, not author = Albert Einstein, title = title. I thought it was very fast to transform "Q937" into "Albert Einstein", but at the moment it is not. It appears to take as long as loading the whole item. So you should factor in the time for loading data about the authors, publishers, etc. So maybe just 5/6 citation templates would add one second of loading time. That would change if the mechanism to retrieve items' labels is improved. I think user:Lydia Pintscher (WMDE) has plans for that. --Zolo (talk) 13:11, 24 October 2014 (UTC)[reply]