Wikimedia Blog/Drafts/Thousands of infographics from Berria newspaper are now in the Commons

From Meta, a Wikimedia project coordination wiki

Title ideas[edit]

  • Thousands of infographics from Berria newspaper are now in the Commons

Summary[edit]

When we contacted Berria, the only Basque-language newspaper, asking if there was something we could agree to leave in the Commons, we didn't think we would finish getting thousands of infographics published in their history. Some months later we have more than 2.700 of then uploaded and there will be thousands more to come.

Body[edit]

Galder Gonzalez, Basque Wikimedians User Group

This year, the Basque Wikimedians User Group has carried out an ambitious Education Project with Basque Government Department of Culture's financial support. The aim of this project is to improve a series of essential articles for High School students, with the collaboration of university professors and lecturers. One of the conditions of this financial aid was to allocate a sum to the release to the commons of already-produced quality content in Basque language.

Berria is the only generalist newspaper in Basque and they have large experience in publishing content. When we sat down with them for the first time in November last year, proposing that part of our budget could be used to release content that they had, we still didn't know the impact that this process would have.

The most difficult thing is always to get started, to determine what can be used and what is interesting. We decided that two special websites' texts could be very interesting: one called Oroiteria, about the Spanish Civil War in the Basque Country; the other one a website about the paramilitary group GAL. However, as the meeting progressed, we realized that one of the most interesting products Berria could offer was the daily infographics they publish. We asked for an estimate of how many there would be, and we finished the meeting.

A few days later we received the answer: about 16,000, many of them unclassified, because the new database system had been running for a few years. They made us a simple offer: if we had a hard drive, they were ours. We made a counter-offer: we would allocate part of our funds to help them catalogue, archive and upload these images to Commons. We think that this collaboration would be more interesting: on the one hand, it eliminated the problem of who should organize this amount of data; on the other hand, it helped Berria to catalogue a product that was not properly archived.

During the first few months of 2018, we discussed how to change licenses for all products. And Berria gave us another pleasant surprise: they changed the whole online newspaper to the cc-by-sa license. So, in addition to having all past content under a free license, we will also have future content freely available. And that is an advantage in a language that does not generate a lot of content, as happens to Basque language.

After analyzing the types of content they had, we came to the conclusion that the best tool to upload was Pattypan, as it allowed them too to save that data in a standardized way. And then came the great moment: uploading the first image and see how to improve that upload. We proposed that the images shoud be .svg instead of .jpg. Their original files were .eps, so they could export to both formats. The idea of having vector files was more interesting: they can be easily translated, and the text can be enlarged without losing quality.

The next problem was categorization. Berria had a series of tags that she uses for her own database. We decided that we had to keep those labels, in addition to having a general category. These tags serve as a subcategories, and make it easier to search. In the future, these categories could also be the basis for offering content in more than one language.

Many of the infographics are simple graphics. For example, there is a lot of economic data presented in a graphical format. It may be difficult to use these graphs in any article, as they are very specific and time-limited topics. However, others are timeless, such as maps of mountain routes, those that describe historical events, or those that explain concepts such as network neutrality.

Nowadays, more than 3,000 infographics have already been uploaded, and every week there are new uploads, as Berria organizes the old infographics. And thanks to the cc-by-sa license, we can also manually take new infographics uploaded to their newspaper. We all win.