Grants:Project/Rapid/Hjfocs/soweego 1.1/Report

Report accepted

This report for a Rapid Grant approved in FY 2019-20 has been reviewed and accepted by the Wikimedia Foundation.

To read the approved grant submission describing the plan for this project, please visit Grants:Project/Rapid/Hjfocs/soweego 1.1.
You may still comment on this report on its discussion page, or visit the discussion page to read the discussion about this report.
You are welcome to Email rapidgrants at wikimedia dot org at any time if you have questions or concerns about this report.

Goals[edit]

Did you meet your goals? Are you happy with how the project went?

Absolutely yes. The main goal was to get the highest-quality Wikidata links, in the form of identifier statements. This yielded an improvement in terms of average precision over all target catalogs. The table below displays a comparative performance evaluation between the best standalone algorithm and ensemble ones. All values are an average with respect to each target dataset.

The higher the better.

Algorithm	Precision	Recall	F1
multi-layer perceptron	.916	.934	.925
soft voting	.919	.930	.924
gated	.922	.926	.924
hard voting	.914	.934	.923
stacked	.923	.924	.923

See chapter 7 of Tupini07's MSc thesis for a detailed explanation.^[1]

Outcome[edit]

Please report on your original project targets.

Target outcome	Achieved outcome	Explanation
Release of `soweego` version 1.1	Ready to go	We will announce the release as soon as the last minor comments of pull request #372^[2] are addressed
Documentation	Tupini07's MSc thesis	The resource is publicly available^[1]
Developers engagement	5 forks,^[3] feedback raised by third-parties,^[4] 60 stars^[5]	We managed to attract more potential contributors

Learning[edit]

Projects do not always go according to plan. Sharing what you learned can help you and others plan similar projects in the future. Help the movement learn from your experience by answering the following questions:

What worked well?

The project scope and time span were very small, which allowed us to effectively address specific activities.

What did not work so well?

The proposal involved experimental research, so there was a risk that the results would not meet our forecast.

We were actually expecting slightly better performances.

What would you do differently next time?

Perhaps focus on more low-hanging fruits, rather than go for more experiments.

Finances[edit]

Grant funds spent[edit]

Please describe how much grant money you spent for approved expenses, and tell us what you spent it on.

We spent the whole budget to sustain the worktime. The task breakdown follows.

Task	Timeline
Make SLP and MLP compatible with the current hyperparameter grid search	September 2
Add decision trees as a classifier	September 2
Explore different ways in which we can ensemble the current classifiers	September 13
Super-confident predictions	September 25
Add logistic regression as a classifier	September 30
Evaluate performance of ensemble methods	October 1

Remaining funds[edit]

Do you have any remaining grant funds?

No.

References[edit]

[thesis-1] ttps://tools.wmflabs.org/soweego/Tupini07_MSc_thesis.pdf

[2] ttps://github.com/Wikidata/soweego/pull/372

[3] ttps://github.com/Wikidata/soweego/network/members

[4] ttps://github.com/Wikidata/soweego/issues

[5] ttps://github.com/Wikidata/soweego/stargazers

[1]

[2]

[3]

[4]

[5]