Grants talk:Project/Rapid/Hjfocs/soweego 1.1

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Usage statistics[edit]

@Hjfocs: Do you have any statistics how much https://soweego.readthedocs.io/en/latest/ is used? Things like number of different users using it, number of edits done, etc? Please do a bit of showing off of all the cool results. Multichill (talk) 09:34, 21 July 2019 (UTC)

Hi @Multichill: thanks for your feedback, much appreciated.
  1. d:User:Soweego bot is uploading confident results to Wikidata right now. You can keep an eye on its edits here;[1]
  2. Medium-confident results are getting into Mix'n'match, and some curation by users has already started;[2][3]
  3. With respect to the software users and contributors, you can have a look here.[4] Note that version 1 is not yet released (will be done by the end of this week), so the current contributors are the core developers.
I also suggest to watch the original project page:[5] the final report will be published very soon.
Hope this is useful. Cheers, --Hjfocs (talk) 13:26, 22 July 2019 (UTC)

Reply to Meisam's endorsement[edit]

Dear Meisam,

First, thanks for your feedback on the behavior of d:User:Soweego bot so far, it really helps improve it. I'd like to clarify the points you raised:[6]

  1. music is indeed a challenging field due to the relative scarcity of data/statements in both Wikidata and the target catalogs themselves. When an item lacks of useful data, linking is tricky both for humans and machines;
  2. this RG proposal aims at improving the quality of links, so it will most probably impact the most difficult ones;
  3. the past issues you mentioned were caused by experimental uploads of the baseline system (rule-based), and happened during the very development of the project. soweego has evolved since then: it now uses supervised machine learning and has expanded the set of features that are taken into account when linking;
  4. apologies for not reverting the edits you mentioned. I personally tried my best to fix the bot mistakes, see my contributions[7] back then;
  5. preventing the addition of existing identifiers is a very nice point, and we should probably raise priority to a known issue[8] that requires quite extra work.

Hope this helps shed light. Cheers, Hjfocs (talk) 14:49, 9 August 2019 (UTC)

@Hjfocs:
1. Yes! It is tricky for humans to do this job. But no users intentionally adds uncertain claims to the Wikidata. Those who do it, should not have even autopatrolled flag, let alone bot flag.
4. “The contributions of a bot account remain the responsibility of its operator” [1]. You ARE responsible for all the edits done with your bot. Being aware of its wrong edits and not fixing them is not a responsible behavior.
By the way, my comments are not meant to undermine your efforts in improving the Wikidata. I am fully aware that your task is by no means trivial. But also mixing the correct information with uncertain ones using a account with bot flag is a recipe to ruin the integrity of the whole database. Maybe having a list of uncertain matches for the human review is a better approach. Cheers! -- Meisam (talk) 19:20, 10 August 2019 (UTC)

References[edit]

Grant request approved[edit]

Hello, Hjfocs; thank you for all the work you do for the Wiki movement. We are approving this grant request. Regards, MMontes (WMF) (talk) 20:59, 19 August 2019 (UTC)