Jump to content

Wikibase Community User Group/Meetings/2021-05-27/Notes

From Meta, a Wikimedia project coordination wiki

Archived notes from Etherpad of the 2021-05-27 meeting.

Schedule

[edit]

Participants (who is here)

[edit]
  1. Jeroen De Dauw (https://Wikibase.Consulting)
  2. Dennis Diefenbach (https://the-qa-company.com/products/Wikibase)
  3. Laurence Parry
  4. Giovanni Bergamin
  5. Jeff Goeke-Smith
  6. Jarmo Saarikko
  7. JShieh
  8. Alice Santiago Faria (Researcher, FCSH, Universidade NOVA de Lisboa)
  9. Amy Ruskin
  10. Mohammed (WMDE)
  11. Bayan (WMDE)

Agenda

[edit]
  • We invite you to come and share out about your project or what you're working on around Wikibase.
  • <Add your topics below>

Alice SF: Building - eViterbo - a database of architects, constructors, artists, … from Portugal and Portuguese empire (developed under the research project https://technetempire.fcsh.unl.pt/en/), using MediaWiki software / Wikibase.

Once a large set of data is inserted, it will be link to Wikidata (we're working with Wikimedia Portugal on this).

We're trying to find the best way to connect with https://nodegoat.net/ and in the future to connect the data to its sources (that is to the catalogues/digital files of Archives and Libraries). I'm not the tech person of the project.

Notes

[edit]
  • 9 participants (+2 later)

Jeff Goeke Smith (Enslaved.org) we have an intent to have 60 separate data sets in the next two years. we are getting to the point where it is a process where we know where we are going with it. It is actually doing what we intended it is working. the challenge is I don't know how open refine will be

Jackie: have you created a connector from OpenRefine directly to your wikibase?

Jeff: halfway. there is a peace of software which allows OpenRefine to do searches in Wikibase

Jackie: Right now we are still providing the schema and then downloading a quick statment and loading into our own Wikibase

Jeff: in our case the specific steps of generating the delta and using QuickStatements to upload the data have been official

Dennis: we experimented in this OpenRefine extension for Wikibase. we were able to configure it and were able to ingest the data in a wikibase that was not Wikidata. it was not easy to configure it. The guy who maintained it want to release it and he recently wrote an email asking if people were having problems. You could reach out to him.

Mohammed: what is the frequency of your uploads ?

Jeff: We have staff whose job it is is to interact with our partners who have the data sources. a staff member is working on a data set at a time. then through OpenRefine building the conversation necessary to general the file. sometimes the data set is complicated enough that we have to do it piece by piece. some of those jobs take days to run that is the result of how Wikibase is implemented.

Mohammed: do you have a finite amount of data to have to upload?

Jeff: I assume no body is generating records past the year 1865. no we don't have a defined end goal. we know the amount of data is finite we don't know where it is. It is finite but unknown how large it is.

Laurence: This month WMDE released the spring release for the docker. As I understand it the plan is for them to be making the Wikibase main master branch compatible with the latest release of MediaWiki.

Jackie's follow up question to Jeff: when jeff was alluding to the fact that sometimes the data set took more than a day to load I was curious if they hit the ceiling?

Jeff: I have seen enough of the behaviours of the system to know that you can influence how if goes based on the environment. we added the ability to run a series of processes to run the upload. the difference between the data set taking two days or two hours to load does not make that much of a difference they can let it run over the weekend. we don't really have a reason to get the operation to run faster than it is right now.

Dennis: QuickStatements is not the only problem there is also how Wikibase API works.

Jeff: every time I was asked to make the system faster, it was about evaluating the system as a whole.

Dennis: if we replace QuickStatements by OpenRefine the problem of ingesting big data will remain

Jeff : FYI don't generate an item which has a property of two hundred thousand values

Jeff: another thing I tripped over if you are generating an item with two hundred thousand values, the worse part is that there are two hundred thousand revisions for the value... I chose to got with the path of compressing history.

Jeff: Blazegraph does not contain the revision history, you have to watch out for the Blazegraph updater. I can just dump the Blazegraph and reload it.

Laurence. have you looked into a streaming update for Wikidata querying services?

Jeff: there is already a streaming updater in place however it is limited in its capability.

Jackie: in Wikidata the property is spoked or written or you can choose. there is not a way to say this is not the Chinese mandarin.

Dennis: are you saying you want to add a support for anther kind of Chinese?

Jackie if we have our own Wikibase we need to find a way to address this differently. There is a need for us to go to detail into the language vs the scripts.

Jeff. so I recognise the sort of problem you are describing. I am the wrong person to comment on it in depth I am not an ontologist. what we did for enslaved we build an anthology that took years of work. from that ontology we described it using wikibase tooling. that is the hard way but reasonably successful. It allows us to represent what we want to represent

Jackie: when we were modelling out Africana group.

Jeff: if you want to find something hilarious, Jefferson county Virginia existed twice. anytime somebody says this you have to ask when. It is an interesting problem. this is one of the reasons we are running own wikibase so we can generate the model to represent the data we are handling

Mohammed: we have some minutes left to talk about some meta stuff. there as been some issues to deal with the affiliation status. Laurence have been involved with that. have you made some progress?

Laurence: not much. we would need an election we have not got a process for that. we have seven days for nomination and seven? for voting. if there are other ideas of people who want to organise I would be glad to have the help. if there is anything before next meeting I will announce it in telegram and mailing list. if you want to get involved you can reach out to me.

Mohammed: what I would suggest is if you are able to come up with something and there is no objections then we can move forward with that

  • Questions/Feedback
  • Next session