Research talk:Expanding Wikipedia articles across languages/Data

From Meta, a Wikimedia project coordination wiki

@Tizianopiccardi: I copied your comment from the email below and left a response for it.

" I discussed with Baha the high-level overview of the current implementation, and I'll send him the dataset for English version by tomorrow.

For the English dataset, should I send him a version generated from the complete dataset? Or we need for some reason to keep away a portion to use as a testing set? "

@Tizianopiccardi: @Bmansurov (WMF): can you expand what will be the items in the dataset? That way we can figure out if some data will need to be kept aside or not. --LZia (WMF) (talk) 21:45, 8 January 2018 (UTC)[reply]

The data will be similar to the French data downloadable here, something like this:
{"category":"Catégorie:Ville_de_Souss-Massa-Drâa","recs":[{"relevance":0.3333333333333333,"title":"Notes et références"},{"relevance":0.3333333333333333,"title":"Voir aussi"},{"relevance":0.2222222222222222,"title":"Démographie"},{"relevance":0.2222222222222222,"title":"Économie"},{"relevance":0.1111111111111111,"title":"Infrastructures"},{"relevance":0.1111111111111111,"title":"Culture"},{"relevance":0.1111111111111111,"title":"Population"},{"relevance":0.1111111111111111,"title":"Manifestations"},{"relevance":0.1111111111111111,"title":"Vue d'ensemble"},{"relevance":0.1111111111111111,"title":"Climat"}]} Bmansurov (WMF) (talk) 21:59, 8 January 2018 (UTC)[reply]
@Bmansurov (WMF): I updated the /Data page with the English version Tizianopiccardi (talk) 19:37, 9 January 2018 (UTC)[reply]
@Tizianopiccardi: Thanks! Does the data contain everything, or did you keep some of it for testing? Bmansurov (WMF) (talk) 19:47, 9 January 2018 (UTC)[reply]
@Bmansurov (WMF): Everything in both cases (FR | EN). I assume that if we need to run some evaluation, it's better to run and experiment with humans. The authomatic evaluation is not very significative in this case. Tizianopiccardi (talk) 14:16, 10 January 2018 (UTC)[reply]