Wikimedia Blog/Drafts/Brian Wolff profile

From Meta, a Wikimedia project coordination wiki

Title[edit]

Exploring new ways to share and showcase content from Wikimedia Commons: Brian Wolff profile

Text[edit]

Wikimedia Commons has more than 17 million freely licensed media files available for anyone's use in practically any way, so long as the free licensing terms are upheld. The ever growing database speaks to the passion that Wikimedians have for sharing free media to the benefit of anyone around the world.

Making sense of this massive amount uploaded images, video and audio files, however, hasn’t always been so easy or straightforward. One of the challenges, as Commons contributor Brian Wolff can attest, is that metadata isn’t well integrated into the commons database, which is the standard with most image databases. On-wiki descriptions generally aren’t included in the file’s internal metadata, which can result in loss of important information when the file is reused outside of Wikimedia. Additionally the data that is in the file’s internal metadata is ignored by search, and cannot be programmatically used inside the wiki.

Screenshot of the new tiled gallery view that Wolff is testing

When a media file is uploaded to Commons today, a table is added to the file page with some information like EXIF values for aperture and shutter speed, but Wolff believes a lot more can be done with the metadata, e.g. to better integrate with search capability. In the long term, he expects people will use Wikimedia commons search to find files only with a certain license, taken with a certain camera, or on a certain date, etc.

"Editing file metadata in our current setup is a not happening as well as it should at the moment, which is sad,” he said. Currently people need to download the file, edit the metadata, and re-upload the file, if they want the metadata to be included in the file. “But one approach that's been suggested is just to use one of the existing program like ExifTool and put an interface on top of it in MediaWiki."

Wolff thinks this shouldn’t be hard to accomplish, as he doesn't see any major technical challenges that couldn’t be surmounted. Allowing this would let the updated metadata stick with the file, instead of just being on a wiki page. Since files rarely stay within the wiki, if the metadata is not included inside the file, it gets lost as soon as the file leaves the website.

Wolff's computer skills have grown from the days of his first Winnie-the-Pooh computer game his parents gave him to learn to read. Wolff first became interested in Commons during a Google's Summer of Code internship with the Wikimedia Foundation in 2010. He had been a regular Wikinewsie since 2004, but he said he it had been difficult to make the jump from Wikinews contributor to MediaWiki developer.

"[Google's Summer of Code 2010] seemed like a good opportunity to actually become a member of the MediaWiki community, so to speak.” he said. “Before that, I was kind of a bit on, well not the outskirts, but I was very much a newbie and I was kind of like stumbling around. It served as kind of an opportunity to become integrated in the community.”

In May, he completed an undergraduate degree in computer science, with a math minor, at Dalhousie University in Halifax, Nova Scotia. This summer, he’s going to be back at the Wikimedia Foundation working on Commons. Wolff will explore a number of issues that should make the Commons database more useful and user friendly.

This summer, he will be working on Image patrolling features for Commons, among other capacities. This will allow admins to check new uploads as they come in, in an organized fashion. Normal pages can be patrolled using new page patrol or the more recent page curation tools, but no equivalent tools exist for patrolling images. This is important, as many of the pictures received by commons aren’t appropriate for a database of educational material that can be freely used by everyone. Many people just upload pictures of themselves, which is ok if that person happens to be famous, or even if that person is an editor and plans to use the photo on their own user page, but for the average internet goer, that’s not the type of content we want. Additionally, many people try to upload files directly copied from commercial websites, which is generally not ok, since these files are usually owned by somebody else.

Beyond this, Wolff said he will explore a number of other areas that, taken separately, might seem trivial. "Personally, I think there's a lot of kind of little things that each individually don't matter much but, combined, would be really useful,” he said.

According to Wolff, this can include things like making upload log entries include a hash of the image in order so that tool makers can more easily associate log entries with actual images, or allowing people to specify what page number is displayed when putting a pdf file in an image gallery. Some other projects he's working on:

  • Experimenting with a different image gallery layout that is used on category listings as well as by users with the gallery tag, a project he notes is still very experimental and may change significantly.
  • Another thing he may consider trialling: an optional gallery mode on category listings for the subcategory section, where each subcategory gets a representative image from that category, instead of just showing a textual link.

If you have any questions or comments for Brian, you can reach him on irc.freenode.net at #mediawiki and #wikimedia-commons or at his talk page.

Profile by Donna Peterson, Communications Volunteer
Interview by Matthew Roth, Global Communications Manager

Notes[edit]