Research:Crowdsourced evaluation of Upload Wizard instructions

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Created
20:56, 25 April 2018 (UTC)
Duration:  2018-April — 2018-May
GearRotate.svg

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.


The overall goal of this project is to understand how to phrase instructions in the Upload Wizard interface in order to reduce potential confusion between the "caption" and "description" fields, allowing uploaders to add short captions to each media item they upload that are rich and accurate without directly duplicating the content of the (ideally, longer and more detailed) description field.

Research questions[edit]

  1. Do different sets of instructions elicit captions of different lengths? All things being equal, longer captions contain more metadata about an image. Ideally, we would like uploaders to provide longer captions (within the character limits of the field) rather than shorter ones.
  2. Do different sets of instructions produce better quality captions? The way that people are instructed to describe an image in a caption can influence the overall quality of the captions they generate. Instructions should be as clear and accurate as possible, to ensure the most accurate and detailed captions.

Methods[edit]

Perform a controlled study with Amazon Mechanical Turk workers. Each worker will view one of 10 images and presented with one of 4 different instruction conditions. The instructions tell the worker to add both a caption and a description for the image. We will aggregate the data across all conditions and evaluate which of the 4 instruction sets produced the best overall captions and descriptions.

Experimental conditions[edit]

Instructional conditions displayed to Turk workers
instruction set Instruction text
instruction_1 Preview Text

Add a short phrase to convey what this file represents, including only the most relevant information.

instruction_2 Preview Text

Add a one-line explanation of what this file represents, including only the most relevant information.

instruction_3 Caption

Add a short phrase to convey what this file represents, including only the most relevant information.

instruction_4 Caption

Add a one-line explanation of what this file represents, including only the most relevant information.

Images displayed to Turk workers
image number image URL
image_1 https://upload.wikimedia.org/wikipedia/commons/f/f3/GDSF_2008_old_fair_section.jpg
image_2 https://upload.wikimedia.org/wikipedia/commons/8/80/BAMICORI.jpg
image_3 https://upload.wikimedia.org/wikipedia/commons/d/d3/20141203_155433_B.jpg
image_4 https://upload.wikimedia.org/wikipedia/commons/5/59/Cat_Matahari_on_the_%27Internet%27.jpg
image_5 https://upload.wikimedia.org/wikipedia/commons/a/aa/Dog%2C_man%2C_portrait%2C_outdoor_chair%2C_yard_Fortepan_6371.jpg
image_6 https://upload.wikimedia.org/wikipedia/commons/4/40/Car_in_Oradour-sur-Glane4.jpg
image_7 https://upload.wikimedia.org/wikipedia/commons/e/eb/People_walking_on_sidewalk_in_central_Pyongyang%2C_with_trolleybus_and_trams_in_background.jpg
image_8 https://upload.wikimedia.org/wikipedia/commons/2/23/Will_Robertson_of_the_Washington_Bicycle_Club_riding_an_American_Star_Bicycle_down_the_steps_of_the_United_States_Capitol_in_1885.jpeg
image_9 https://upload.wikimedia.org/wikipedia/commons/9/93/Grocery_shopping_in_Tokyo%2C_Japan_-_DSC09687.JPG
image_10 https://upload.wikimedia.org/wikipedia/commons/9/90/S-Bahn_Berlin_Gesundbrunnen.jpg

Policy, Ethics and Human Subjects Research[edit]

This study has been reviewed by the Wikimedia Foundation Legal department. It does not involve collecting private or personal data about the Turk workers who participate. A privacy statement for this study (shown to each Turk worker before they perform the task) is available on WikimediaFoundation.org. We also made sure to follow the best practices for test design, payment, and communication outlined on the WeAreDynamo wiki.[1]

Results[edit]

The final dataset consisted of 286 captions and descriptions distributed randomly across four experimental conditions and ten images.

RQ1[edit]

Our first analysis was to investigate whether different instructions led to captions of different lengths. We found no significant differences between the average lengths of captions (or descriptions) genereted by Turk workers across our four conditions.

RQ2[edit]

Our second analysis investigated whether the quality of captions was impacted by the instructions. For this analysis, we randomly paired captions generated with one instruction set with other captions (for the same image) generated with the other instruction set. We then performed pairwise comparisons between each random pair of captions: which caption is better, A or B?

See also[edit]

References[edit]

  1. "Guidelines for Academic Requesters - WeAreDynamo Wiki". wiki.wearedynamo.org. Retrieved 2018-05-23.