Archived notes from Etherpad of the 2021-05-27 meeting.
- 16:00 UTC, 1 hour, Thursday 27th May 2021 (and same time last Thursday of every month)
- Online meeting on Google Meet: https://meet.google.com/nky-nwdx-tuf
- Join by phone: https://meet.google.com/tel/nky-nwdx-tuf?pin=4267848269474&hs=1
Participants (who is here)
- Jeroen De Dauw (https://Wikibase.Consulting)
- Dennis Diefenbach (https://the-qa-company.com/products/Wikibase)
- Laurence Parry
- Giovanni Bergamin
- Jeff Goeke-Smith
- Jarmo Saarikko
- Alice Santiago Faria (Researcher, FCSH, Universidade NOVA de Lisboa)
- Amy Ruskin
- Mohammed (WMDE)
- Bayan (WMDE)
- We invite you to come and share out about your project or what you're working on around Wikibase.
- <Add your topics below>
Alice SF: Building - eViterbo - a database of architects, constructors, artists, … from Portugal and Portuguese empire (developed under the research project https://technetempire.fcsh.unl.pt/en/), using MediaWiki software / Wikibase.
Once a large set of data is inserted, it will be link to Wikidata (we're working with Wikimedia Portugal on this).
We're trying to find the best way to connect with https://nodegoat.net/ and in the future to connect the data to its sources (that is to the catalogues/digital files of Archives and Libraries). I'm not the tech person of the project.
- 9 participants
Jeff Goeke Smith (inslaved) we have an intent to have 60 separate data sets in the next two years. we are getting to the point where it is a process where we know where we are going with it. It is actually doing what we intended it is working. the challenge is I don't know how open refine will be
Jackie: have you created a connector from open refin directly to your wikibase?
Jeff: halfway. there is a peace of software which allows open refineto do searches in Wikibase
Jackie: Right now we are still providing the schema and then downloading a quick statment and loading into our own Wikibase
Jeff: in our case the specific steps of generating the delta and using quickstatments to upload the data have been official
Dennis: we experimented in this open refine extention for Wikibase. we were able to configure it and were able to ingest the data in a wikibase that was not wikidata. it was not easy to configure it. The guy who maintained it want to release it and he recently wrote an email asking if people were having problems.You could reach out to him.
Mohammed: what is the frequency of your uploads ?
Jeff: We have staff whose job it is is to interacte with our partners who have the data sources. a staff memeber is working on a data set at a time. then through open refine building the converstion nessecary to general the file. sometimes the data set is complicated enought that we have to do it piece by piece. some of those jobs take days to run that is the resulte of how Wikibase if implemented.
Mohammed: do you have a finite amout of data to have to upload?
Jeff: I assume no body is generating reconrds past the year 1865. no we dont have a defined end goal. we know the amount of data is finite we dont know where it is. It is finit but unknown how large it is.
Laurence: This month WMDE released the spring release for the docker. As i understand it the plan is for them to be making the Wikibase main master branch compatible with the latested release of MediaWiki.
Jackie's follow up question to Jeff: when jeff was alluding to the fact that sometimes the data set took more than a day to load I was curious if they hit the cealing?
Jeff: I have seen enought of the behavious of the system to know that you can influnce how if goes based on the environment. we added the ability to run a series of processes to run the upload. the difference between the data set taking two days or two hours to load does not make that much of a difference they can let it run over the weekend. we dont really have a reason to get the operation to run faster than it is right now.
Dennis: quickstatement is not the only problem there is also how Wikibase API works.
Jeff: everytime i was asked to make the systme faster, it was about evaluating the system as a whole.
Dennis if we replace quickstatment by open refine the problem of ingesting big data will remain
Jeff : FYI dont generate an item which has a property of two hundrend thousand values
Jeff: another thing I tripped over if you are generating an item with two hundred thousand values, the worse part is that there are two hundred thousand revisions for the value... I chose to got with the path of compressing history.
Jeff: blasegraph does not containe the revision history, you have to watch out for the blase graph updater. I can just dump the blase graph and reload it.
Laurence. have you looked into a streaming update for Wikidata quering services?
Jeff: there is already a streaming updater in place however it is limited in its capability.
Jackie: in Wikidata the property is spoked or written or you can choose. there is not a way to say this is not the chinese madarin.
Dennis: are you saying you want to add a support for anther kind of Chinese?
Jackie if we have our own Wikibase we need to find a way to address this differently. There is a need for us to go to detail into the lanaguage vs the scripts.
Jeff. so i recognise the sorte of problem you are describing. I am the wrong person to comment on it in depth I am not an onthologist. what we did for enslaved we build an anthology that took years of work. from that onthology we described it using wikibase tooling. that is the hard way but reasonably succussful. It allows us to represente what we want to represent
Jackie: when we were modeling out africana group.
Jeff: if you want to find somthing helarous jefferson county verginia existed twice. anytime somebody says this you have to ask when. It is an interesting problem. this is one of the reasons we are running own wikibase so we can generate the model to represente the data we are handeling
Mohammed: we have some minutes left to talk about some meta stuff. there as been some issues to deal with the affiliation status. Laurence have been envolved with that. have you made some progresse?
Laurence: not much. we would need an election we have not got a process for that. we have seven days for nomitation and seven? for voting. if there are other ideas of people who want to organise I would be glad to have the help. if there is anything before next meeting I will announce it in telegrame and mailing list. if you want to get envolved you can reach out to me.
Mohammed: what I would suggest is if you are able to come up with something and there is no obgections then we can move forward with that
- Next session