Talk:Learning and Evaluation/Evaluation reports/2015/Wiki Loves Monuments

This section is meant to encourage a discussion around next steps, both as potential areas for further investigation and areas in which to improve tracking of inputs, outputs, and outcomes to improve future reports. Please reach out on the talk page if you would like to contribute to the conversation.

Potential areas for further investigation[edit]

What proportion of media content do data captured represent?[edit]

Maybe make comparison to non-bot media uploads during the reporting frame to relate to collective impact?

What is the comparative value of the monuments image repository generated by Wiki Loves Monuments?[edit]

Investigation into comparable photo repositories to estimate a value produced by the content generated by these contests.

What could be done to improve retention rates, especially amongst new users?[edit]

Examine what program leaders are doing differently to promote retention of new users perhaps with further case study of Summer of Monuments as an extended contest, and shorter interval photo event series’ as way to keep users engaged.

Deeper investigation into the drivers of image uploads and their quality.[edit]

What are program leaders doing differently to promote high quality content contributions? Through qualitative data collection and knowledge sharing we can look to understand the potential mediating factors of uploads use and quality.

Deeper investigation into the promotion and expectations for participation in the events.[edit]

In order to better understand different implementation designs and their effectiveness it will be important to understand what specific steps were taken in promoting each national event along with the particular goals for participation as well as content production and use.

Better understanding of program inputs.[edit]

For next years contests, efforts should be made to improve the reporting of program level budgets (only 43% reported for 2013 and 2014 contests); staff and volunteer hours and their value (29% reported either staff or volunteer hours for 2013 and 2014 and 15% the value of either); and the gender of participants (7% reported for 2013 and 2014).

Answering the value in- value out questions[edit]

To really dig into questions about how inputs such as how inputs like‘’’budget, organizer hours, and in-kind donations’’’ affect program outcomes and impact the success of a program, we need more program leaders to track and share these inputs, examine their results, and report in a similar way how effective their program implementations.

Most importantly, events that report all three, inputs, outputs, and outcomes are especially helpful, since it allows for a more consistent comparisons across events. Some important tips to help.

To learn how to find metrics like number of media uploaded, unique images used, new users, existing users who are active editors, etc, see the Global metrics learning pattern.:

If you are unable to capture all important metrics for your program, be sure to record the start times, end times, and participant lists so that you can get help to access more data.

Reporting the ‘’’value of organizer hours, both for staff and volunteers, and in-kind donations in dollars’’ will allow us to look at the cost effectiveness of different program approaches versus their goals.

More clearly articulating how program activities align to goals, especially in the case of less tangible goals, such as ‘’increase volunteer motivation and commitment’’, will help program leaders to choose what is best to measure and to measure those outcomes that share the story of their success.

Questions about measures[edit]

At Grants:Evaluation/Evaluation reports/2015/Wiki Loves Monuments/Outcomes you mention that 1446 (0.3%) images were rated as Featured Pictures. Intersecting Commons:Category:Featured pictures with Commons:Category:Wiki Loves Monuments yields 819 featured pictures. Looking at Grants:Evaluation/Evaluation reports/2015/Wiki Loves Monuments/Limitations I assume only 2013 and 2014 data is covered by your evaluation. That would trim down the numbers even further and result in 408 featured pictures for 2013 and 204 featured pictures for 2014. I wasn't able to identify how featured pictures are counted in your approach. Please note that Commons:Category:Featured pictures contains all featured pictures, whereas Commons:Category:Featured pictures on Wikimedia Commons includes only those featured pictures nominated on Wikimedia Commons.

Please provide a replicable approach for calculating 1446 featured pictures.
Please define featured picture based on your current data collection.
Please provide direct links to calculations made via catscan and others tools if they were used in your evaluation.

Regards, Christoph Braun (talk) 13:05, 24 April 2015 (UTC)[reply]

Hello Christoph Braun--

First of all, thank you for bringing this to our attention. We strive to always be completely accurate, but we are humans with limited resources, and this error slipped past us. We really appreciate you taking the time to check the numbers!

We found an error in which a few of the data counts for Quality images and Featured pictures were transposed, which caused the incorrect totals and number of observations. We are correcting these errors in the report, in the appendices to the report, and in the linked summative data spreadsheet.

After correcting for these errors, we find there are 4276 Quality images (1078 in 2013 and 3198 in 2014), 182 Featured pictures on Wikimedia Commons (89 in 2013 and 93 in 2014), and 806 Valued images (147 in 2013 and 659 in 2014) for the images captured in the report. These are still different than the numbers you shared, so let me provide specific responses to your points:

Please provide a replicable approach for calculating 1446 featured pictures.

We calculated the total number of featured pictures (which is now 182) by summing the number of featured pictures in the category for each of the 72 country contests included in the report, which can be found in the appendix. We used catscan to find the number of featured images following this learning pattern.

Please define featured picture based on your current data collection.

We defined Featured picture as an image that has the “Assessments” template on it’s file page on Commons.

Please provide direct links to calculations made via catscan and others tools if they were used in your evaluation.

Because we looked up featured pictures in each country category, there were a lot of catscan queries, but here are an example each for Quality, Featured, and Valued. You can find the number for each country in the appendix.

There are a couple differences in our approach to counting the number of featured pictures: we only included the country contests from the report while you counted the total number of featured pictures in the top Wiki Loves Monuments category; we counted all images in that used the Assessments template while you used the number of images in the Featured pictures category. The first is definitional, but the second is much more interesting, and I would like to open it up for broader discussion with community members:

Should we still be using the Assessments template to capture the greatest number of pictures that were featured on any Wikimedia project? We wanted to include as many featured pictures as possible--those on Commons as well as Photos of the Day or other pictures featured on other Wikimedia projects. To what extent do the templates vs. category intersection options vary in accuracy? How can we work best to consistently gather featured picture data through existing tools? It would be great to know if there is community consensus around this.

I hope this answers your questions, please let me know if you have any other concerns or would like to know more about our measurement process.

Abittaker (WMF) (talk) 14:23, 27 April 2015 (UTC)[reply]

We used catscan to find the number of featured images following this learning pattern.

The page you link does not make mention of this. I assume you mean Grants:Learning patterns/Counting featured, quality and valued content in Commons

Because we looked up featured pictures in each country category, there were a lot of catscan queries

I would suggest using helper templates to mass-generate reproducible queries (for inspiration, I have done such things for API requests with {{List Commons media in category}} or {{List Commons media in category FDC helper}} or {{Glamorous}}).

here are an example each for Quality

This method is flawed − using Templates is not a good approach (as I have mentioned that more than a year ago on Grants_talk:Learning_patterns/Counting_featured,_quality_and_valued_content_in_Commons). Using categories is better. See the corresponding CatScan : QIs using categories.

Jean-Fred (talk) 20:37, 5 May 2015 (UTC)[reply]

Questions about Sharing[edit]

Questions about Connecting[edit]

Average vs. median[edit]

I don't understand, does the footnote about "average" really meaning "median" apply to all occurrences of "average", or only to the occurrences which are followed by the footnote? --Nemo 20:46, 10 April 2015 (UTC)[reply]

Hello Nemo, Yes, all occurrences of "average" are defined as "median average" and the mean and standard deviation presented in the notes. JAnstee (WMF) (talk) 02:16, 24 April 2015 (UTC)[reply]

Collapsing[edit]

Please stop collapsing the references. It breaks the anchors, which do nothing: that's extremely frustrating and truly cruel. --Nemo 20:46, 10 April 2015 (UTC)[reply]

Hi Nemo, thanks for sharing your views about Notes. The reason this is collapsed is because references tend to be really long, and we put a lot of time and effort in making the reports' navigation accessible and useful. The notes can still be opened if someone is interested in reading more on some numbers. We do appreciate your feedback on format issues, and as you amy have seen, we changed the titles in the section to be semantic titles. Although of a different order, it is based on an edit you did to WLM outcomes page. Cheers, MCruz (WMF) (talk) 20:04, 24 April 2015 (UTC)[reply]

I wholeheartedly agree with Nemo − I don’t think anything can justify the breaking of such an expected functionality as clickable references − this makes the navigation « accessible and useful » in my opinion. As for the length of the section, is there really anything that wikitext and CSS can’t do :-þ − I quickly hacked up {{Reflist overflow}} and had a go at /Outputs.

Jean-Fred (talk) 00:43, 6 May 2015 (UTC)[reply]

Thank you both for the input as we work with design, and thank you Jean-Fred for your suggestion also --JAnstee (WMF) (talk) 06:33, 6 May 2015 (UTC)[reply]

Thank you Jean-Fred for creating that new template! I wouldn't have known how, so I really appreciate it. It's a better solution than what we have right now, so I'm going to apply it to all the pages on all the reports in the coming days. Thanks! MCruz (WMF) (talk) 10:33, 8 May 2015 (UTC)[reply]

Syntax[edit]

I'm unable to parse the paragraph Grants:Evaluation/Evaluation_reports/2015/Wiki_Loves_Monuments/Key_findings#Planning for Program Inputs & Outputs, could you please rephrase? Thanks, Nemo 20:50, 10 April 2015 (UTC)[reply]

Thanks, Nemo, now corrected. JAnstee (WMF) (talk) 02:24, 24 April 2015 (UTC)[reply]

Median cost[edit]

In Grants:Evaluation/Evaluation_reports/2015/Wiki_Loves_Monuments/Key_findings#How does the cost of the program compare to its outcomes?, is that a median of the ratios between budget and images, or a ratio of the medians? --Nemo 20:57, 10 April 2015 (UTC)[reply]

This refers to the median of the budget to images calculation for each available paired data point. Such as in the example in which Contest 1 = $0.90/upload; Contest 2 = $.95/upload; Contest 3 = $1.00/upload; Contest 4= $1.00, Contest 5 = $1.05/upload and the median cost per upload is $1.00 JAnstee (WMF) (talk) 02:31, 24 April 2015 (UTC)[reply]

Thanks. I agree that median of the ratios makes most sense. --Nemo 23:39, 24 April 2015 (UTC)[reply]

Unlabelled contest[edit]

Hi all, in the Appendix/more data/inputs section, which country is represented by the entry "WLM_64"? The text just says "WLM 2014" --MichaelMaggs (talk) 07:22, 1 May 2015 (UTC)[reply]

Hi Michael, yes, that is because that reporter shared additional data and gave us permission to share it, but not permission to list their particular contest name alongside their data. JAnstee (WMF) (talk) 16:41, 1 May 2015 (UTC)[reply]

Odd, because "WLM_64" also appears in the Outputs table on the same page, where it is associated with the UK contest, of which I was the organizer. I don't recall as organizer being asked for this information, but if I was I would certainly not have denied permission to state the country name. Or maybe "WLM_64" means different things in different tables? --MichaelMaggs (talk) 13:11, 2 May 2015 (UTC)[reply]

Will you please update the "WLM_64" line in Appendix/more data/inputs to show that it relates to the UK contest, on the assumption that that is what it is? I can reiterate that you have my permission to post that information. --MichaelMaggs (talk) 17:25, 4 May 2015 (UTC)[reply]

cost of evaluation report[edit]

what is the value of this evaluation report to the user of wikipedia, and the cost of producing this evaluation report? who should read this report? --ThurnerRupert (talk) 18:52, 2 May 2015 (UTC)[reply]

Hello Rupert, those are some good questions let me answer what I can:

Costs of evaluation

As an experienced professional from the field of program evaluation, I believe our costs are efficient for the product delivered. For example, to ensure our efficiency, we specifically searched for and brought on two fellows to help us complete the programs evaluation project over the course of their six-month agreements. For those who are regular team members, the time dedicated is only a portion of our many other activities and it varies over the course of the project.

Who are our audiences?

Who are our audiences, also a good question as they are multitude. We seek to support the learning needs of program leaders (grantee or volunteer), movement leaders, grants committee members, our L&E team (for mapping learning strategy), and anyone involved in thinking about, planning, implementing and/or evaluating thee, or similar, program efforts.

Goals of these reports:

Develop a clearer understanding of Wikimedia volunteer outreach programs and their impacts.
Identify positive examples of programs to explore more deeply in order to develop shared best practices and program support networks across communities
Help Wikimedia community leaders explore methods for improving the data collection and reporting of their programs.
Highlight key lessons learned that can be applied to data collection and reporting! We want to support program leaders to make evaluation and learning easy and fun.

JAnstee (WMF) (talk) 06:22, 6 May 2015 (UTC)[reply]

Questions from Wiki Loves Monuments Mailing List[edit]

The following 10 questions were pulled from a thread on the Wiki Loves Monuments mailing list and are reponded to in each of the sections below - JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

1. Concern that no organizers have been involved.[edit]

All identified organizers have been invited through several mentions in our mailing list announcements, blog outreach (Blog Announcement,Data Collection Announcement,Blog on “Filling in the Gaps”), and social media posts. We heard directly from only 39% of possible leaders of the Wiki Loves Monuments contests this year. Unfortunately, if our light touch methods do not work, we do not have the resources to track down so many people. - JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

Mailing lists/Overview can help finding appropriate mailing lists to address or CC. --Nemo 15:16, 4 May 2015 (UTC)[reply]

Hi Nemo, just to make sure I'm understanding your comment, you are suggesting that we should look at that link to cc more mailing lists next time? Thanks! --EGalvez (WMF) (talk) 17:12, 4 May 2015 (UTC)[reply]

It's concerning that while no doubt significant staff time was spent on analyzing the data on just 39% of the programme inputs, your researchers have ignored the rest as 'we do not have the resources to track down so many people'. Doing that would have been time better spent. Were you aware that there is a comprehensive list of national contests on Commons (here for 2014), all of which link directly to the national planning pages? So, a good way to reach the organizers would have been to post on those pages. I can't see that that was done.

Unless you have been able in some way to adjust for the biases inherent in omitting 61% of inputs, it seems to me that nothing useful can be obtained by trying to extract central population statistics such as medians from the data that you have. There is no reason at all to suppose - especially with such a small data set - that the missing data is statistically similar to that which you collected. I believe that the statistics you have worked out are invalid and are not capable of teaching us anything useful. Sorry to be so blunt. --MichaelMaggs (talk) 17:57, 4 May 2015 (UTC)[reply]

Hello, Michael. Thanks for sharing your concerns. Importantly, we are trying to model the use of standardized metrics and this is a second iteration. While we set out to do a more deep effort in pulling data from grantee reports, we had no specific goal for added outreach to program leaders movement-wide beyond what we had in done the first round of beta reports. We did however increase our messaging, posted on user talk pages and sent multiple messages to grantees and their shared program leader contacts for programs mined. I am not sure why it wasn’t considered to also post to the specific page you suggest, thank you for that suggestion. In addition to incorporating direct outreach related to programs which were identified, we also extended the data collection window four months longer to allow time for other voluntary responses and posted our discovered list of implementations in a blog update on the data collection. These were our light touch methods. For which we made initial contact with 70% of the identified leaders.

Also, perhaps important to note more clearly:

In the beta version of these reports we only examined grantee contests and were asked why not look at all of them. Since Wiki Loves Monuments is the most widespread photo event, and the much of the data are accessible, we worked to include them more broadly this year. This did set us up with many contests without reported budgets, which we actually have report of for 43% of the contests.
There was also not a one-to-one relationship between those with budget reported and those we heard directly from. Further, while we only received direct report back from 39% of program leaders, we did direct message another 31% which reported nothing further. The portion of programs with program leaders we were actually unable to contact was 30%, again, thanks for your suggestion.

Still, budget tracking remains an issue, even for grantees as the way people define and mark budget lines is highly varied as well. Here, where we have dug through the documentation to find examples, we have shared them directly so that people can examine different case points themselves and choose what comparisons make sense for them based on the location and planned size of event.

Lastly, yes, we are the first to note the many missing reports of data and the issues with the data distributions not meeting assumptions of statistical normality and thus providing weak measures of central tendency. Still, we present the middle of the line through the most applicable referent, median average, emphasize the wide ranges, and suggest use of interquartile range for understanding what may be a reasonable expectation for observation. Still, there are some useful observations in the distribution of those data shared: hours input can help people to consider how many volunteers and weeks investment might be needed while budgets, if from a comparable context, can offer useful guardrails for budgeting and can be a good example for those investing funds into programs (here they are grantees for the most part). That is where our responsibility to examine the cost data that is available comes in, while many organizers use donated resources and non-WMF funds to support their contests, much of the invested money is donor funds through WMF grants and we need to understand the impact of that spend. The data here are to encourage thoughtful consideration and promote discussion of the goals and impact of our programmatic activities, what we can see through applying this set of systematic lenses, and examine how best to move forward at meeting measurement needs for program leader self-evaluation. JAnstee (WMF) (talk) 06:30, 6 May 2015 (UTC)[reply]

2. Concern that the evaluation is not attending to diversity of Wiki Loves Monument contest purposes.[edit]

In the reporting section Limitations we address the diversity of purposes in contest goals in the priority goals section and the number of program leaders direct reporting. We also make this point in our discussion of key findings in the section on delivery against the top three shared priority goals. We specify that the report does not capture the story of one top goal: Increasing awareness of Wikimedia projects:

“The majority of contest participants are newly registered users to Wikimedia. In addition to the reach of the event itself, nearly 90 percent of program leaders reported that they had developed blogs and other informative online documentation of their events. Promotional reach and potential learning about Wikimedia projects is not captured by the data captured in this report, however, for 2013, a community-led survey was collected from contest participants, which included some items about how participants learned about the contest, the overall results indicate that, most often, participants learned of the events through banner posting (60%) while other routes were reported in much lower proportions.” (Where we have linked to the international survey for 2013 that contains some survey data which partly speaks to this goal.)

Here we have presented, as a gap in metrics reported, what you are concerned we are unaware of.

Further, we continued to include all outcome measures that we are able to get at rather easily using a wiki tool, as identified in our mapping work of metrics for programs in year one. We expanded those metrics slightly to capture the new global metrics for grantee reporting that were initiated for grantees beginning September 2014. We have also incorporated what we know about the spread of contest-specific goals information that has been shared with us through direct response to our voluntary reporting requests.

Importantly, we are explicitly measuring beyond global metrics, based on the vast spread of goals shared to our team. We do this across 10 programs to map the data through another iteration for deeper exploration: we want to know what the data help us to see and what stories and data need to be better reported out by program leaders for us to understand your program impact.

We have not cast judgment but have presented these data to the community for input, consideration, and discussion of next steps. (Please note the “Draft” template remains while we work out potential kinks and also that our messaging is consistently that we want to hear from program leaders like you.)

So thanks for your attention and interest once again, we plan to discuss our known next steps for diving more deeply into some marked programs, but that is of course part of the agenda for the various virtual and in-person meet-ups we are planning over the coming months.- JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

3. Concern that the goal of having pictures of monuments uploaded and goal of raising awareness and getting people past the initial threshold are different.[edit]

That is right, there are many goals stated as priority for photo events, as illustrated in the goals table in the report sections on Limitations. We culled together metrics as have been identified in the beta reports and worked to refine methods for obtaining them - we have not said one goal is more important than another, we have not chosen to exclude data about other outcomes, they are just not being reported. (Note: We left an essay field open in all data requests to allow for linking us to blogs, data, or just make statements about other outcomes measured, still these data are not being shared in reporting.) - JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

4. Concern that simple summary statements that “The average Wiki Loves Monuments contest …” “...hurt ... to see”[edit]

The inclusion of these summaries was appreciated as solution then. Truthfully, these can be really painful statements to have to write as they are, by definition, over-simplifications. However, we compromise in order to make the information accessible to many different audiences of readers.

Importantly, rolling up metrics across several different points of program implementation is a difficult task. By definition it sacrifices complexity, as does developing easy to digest snip-its of information that are requested by so many who are inundated by information in their inboxes. So, yes, if you want the details, please skip them and read the more detailed narrative, or use them to help guide your interest to where you wish to read more deeply, there is a lot of data to wade through, we have worked to make it as accessible as possible. We have tried to format in a linguistically and visually consistent fashion to make these different reading routes available, but differentiated, for different reader preferences. Please feedback on how this is working and continue to share potential solutions as we are always open to improvements.

Lastly, these reports are our team’s effort at illustrating different metrics in use at the highest level of reporting to mirror back the story emerging from data that include several proxy measures of input, output, and outcome goals shared to our team about a set of Wikimedia programs.- JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

5. Concern of too much emphasis on money[edit]

Yes, we have examined the money input, we have also looked to the hours spent organizing, numbers participating, and, to a lesser extent, donated resources, as different inputs to these events. We do this because it was one of the originating needs for the development of the evaluation initiative, to understand where we as a movement are investing our resources, monetary and non-monetary, to affect change to Wikimedia projects. This work, for those who may not have made the connection, falls on our team as part of the foundations principle of stewardship. Many program leaders wish as well as to understand the kinds of investments programs take in order to best plan and decide what programmatic activities to engage in their work to achieve various goals shared across the movement.

Also as grantee reports were our main source for data this year as it is especially relevant to aid grant proposers and grants committees members to better judge different program models and what might be expected across different contexts. This is why we have made efforts to keep the data open for everyone to use this year. - JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

i agree with both. the level of effort to get the photo is not a cost issue only. hard to compare value to cost. but, there are some funders, audiences that want monetary figures, even when they are of little value. and the amateurs need to understand that there are imbedded costs in everything they do. contest leadership is more of an overhead, that will change based on the number of items, not a unit cost. quality assessment likewise. Slowking4 (talk) 16:53, 6 May 2015 (UTC)[reply]

6. Concern of considering uploads valuable only if they are used in Wikipedia articles AND 7. focusing only on part of the goal of collection, development, and promotion of open knowledge and its distribution.[edit]

There is a metric of total media uploaded, a metric of % images used in articles, a metric of number of uses, a metric of photo ratings. Some of these are really great proxies, others are less fulfilling. If we don’t have the outcome data that seems most important, how can we help you to measure and report it?- JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

i kind of like the reuse metric because it goes to filling a need. but the point is well taken that wikimedia is becoming more inward focused- a walled garden, not focused on Open Knowledge movement. hard to measure that as well. Slowking4 (talk) 17:15, 6 May 2015 (UTC)[reply]

8. Where is the list of countries?[edit]

See the full list of countries, along with their data in the Appendix either the table for “Bubble Chart Data” (very basic data summary), or the “Inputs” table in the “More Data” section of the Appendix for complete data by contest event.JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

9. Concern we should have used data from grant reports. Maybe some are not included due to timelines for reporting[edit]

We completely mined grants reporting for programs ending September 2013 through September 2014 - this was another request we aimed to satisfy from the beta reports review and recommendations from community last year.- JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

10. Concern that how to use it is unclear[edit]

The new section within discussion of key findings entitled “How this information can apply to program planning” is to help with this as are the virtual and in-person meet-ups on slate for the next few months, plans for learning pattern/toolkit development, and upcoming outreach to a set of program leaders to help. These actions are all part of what we will present at our initial virtual meet-up this Wednesday. If you are not able to join live, please review the recording afterward and chime in to the discussion on the talk page. - JAnstee (WMF) (talk) 02:43, 4 May 2015 (UTC)[reply]

11. Question about where activity has been measured to assess retention[edit]

While last year the team was not able to assess editing activity globally (i.e., across multiple projects) added features to Wikimetrics and the new Quarry tool have allowed us to examine activity both specific to Commons as well as across all projects. The data summarized in the retention graphic represents editing on any Wikimedia project. For details as to what proportion were still editing on Commons you will need to consult the descriptive text in advance of the graphics or the data tables in the appendix. JAnstee (WMF) (talk) 20:46, 4 May 2015 (UTC)[reply]

Problems with this evaluation[edit]

The general perception I sense on the Wiki Loves Monuments mailing list is that a lot of comments are made in how the evaluation is failing to evaluate Wiki Loves Monuments properly. Seeing the headers on this page, all those comments are missing. I can't understand why only "questions" are taken over from the mailing list and not the negative feedback. As WMF is not active in taking the negative feedback seriously, we should try to collect that feedback on this page. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

PS: The issues that users on the mailinglist have brought up are not not so much concerns, but actual problems. Problems that are insufficiently addressed with the questions section above. Romaine (talk) 09:15, 6 May 2015 (UTC)[reply]

Wiki Loves Monuments is not a consistent project[edit]

As an e-mail as comment says (and I agree with): the evaluation has a big focus on the assumption that Wiki Loves Monuments is a consistent project, similar in each country in what the contest is organised. Then Wiki Loves Monuments is not understood. In reality it divers very much per country. Wiki Loves Monuments is a diverse collection of projects, each of them tailored to the needs of that specific country, by the local community.

By missing this essential point the focus of this evaluation goes towards number crunching ("the average contest") and falls into generalisation, without understanding the core of Wiki Loves Monuments. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

The primary result is lost in this evaluation[edit]

My e-mail says: The focus is too much on money, it gives me a horrible feeling, the community/participants are not a factory plant in what every employee needs to work a minimum number of hours. The primary thing that is done in Wiki Loves Monuments is getting all monuments with a good picture on Wikipedia, not just of the most popular or easy monuments. The first time a contest as such is organised the low hanging fruits are done first, but the evaluation forget to mention that getting the low hanging fruits is not the core goal of Wiki Loves Monuments. The goal of Wiki Loves Monuments is to get a photo of every monument. The more monuments get a picture, it becomes much harder to get a picture of the other monuments. It is failing in describing the actual situation and misses totally what Wiki Loves Monuments is about. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

Images are not only valuable when used in articles & WLM is not a project of just a few months[edit]

As an e-mail as comment says (and I agree with): In the evaluation it seems that the files uploaded to Wikimedia Commons are only "valuable" if they are explicitly used to illustrate Wikipedia articles. This is of course an important factor, but on many Wikipedia articles there is a link to the Commons category. The images that are not used in articles are therefore not useless but valuable. This value is missing in the evaluation.

Also Wiki Loves Monuments is every year a short term project (only in September uploads are allowed to take part in the contest) and results in a large peak of uploaded images in September, but that is only step one. The second step is to process and give all the images further categorization, done by volunteers. This time investment would be seen as having zero value.

The third step is that the images are used in articles. With writing articles, editors must have a choice in what images they can use. There always should be more images available than actually used in the article. Further, the lists on Wikipedia can only show one image per monument.

And it is relatively easy to add images to existing articles. Adding images to new articles takes considerable more time. This means that there is a peak in uploaded images in September, in the years (multiple!) afterwards the editors on Wikipedia write articles and start to use the images. (The community or the organising team is not a production unit that has a steady amount as output.) The evaluation simplifies way too much this processing over years into just only a few months.

Wiki Loves Monuments is a long-term project, and such is also totally missing in the report. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

And another e-mail comments on this. There is a huge difference in what WMF is considering valuable and what the community is considering valuable as well that the focus of WMF is different from the focus of the volunteers who organise Wiki Loves Monuments.

Wiki Loves Monuments is on the first place enriching Wikimedia Commons, our file repository with an aim Wiki Loves Monuments is supporting. The aim of Wikimedia Commons is to provide a media file repository:

that makes available public domain and freely-licensed educational media content to all, and
that acts as a common repository for the various projects of the Wikimedia Foundation.

The report only addresses the second aim, while the first aim is missing in the evaluation. And that first aim is an important goal of Wiki Loves Monuments. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

Editor retention is currently a dead end[edit]

My personal goal of organising Wiki Loves Monuments is to get the cultural heritage monuments on Wikipedia with photo, and that is the actual thing that happens for sure with Wiki Loves Monuments. If other people want to set other goals, I am fine with that.

But, I have analysed for myself what is to me a realistic goal, and I do only come out on having more cultural heritage monuments on Wikipedia with photos. As side effect it is fine to have more people on board, but not as main goal. Let me explain why.

For me personally Wikimedia/Wikipedia is a relative nice environment as I am well in easily learning myself new things and to pick up new stuff. For people who have the capability to explore the Wikimedia environment themselves, it works fine. For those people it is easy to continue after a contest with taking pictures and writing articles, because they have the capability to find those other things to do. But I think most people who have that as strong personal characteristic have already tried this and are still active or found it not interesting enough. The software and environment is excellent for those people to work with, for early adopters and pioneers. But those kind of people are almost sold out now. But still the focus is still pointed towards this kind of people.

Over the past 5 years I have helped many people with editing Wikipedia and with uploading pictures to Commons, and I have spoke with a lot of people why they stopped editing Wikipedia. Almost none of those people have the capacity of exploring the environment on their own. Besides the (hostile) mentality and atmosphere on Wikipedia, the environment of Wikipedia/Wikimedia does not full-fill basic needs those people have. There is not a really friendly environment, there is no productive environment that gives sufficient stimulation, and it is not a really social environment the large majority needs. With the VisualEditor there is an easy way to edit Wikipedia for people who are not comfortably with wikisyntax, but it still lacks a social environment in what those people are comfortable.

As a figure of speech: The early adopters like the assignment to build a house in the middle of the desert, the large majority wants that house already been built and wants to do the decoration only or wants a much more comfortable living.

There is after a contest like Wiki Loves Monuments no environment at all that fits to handle all the participants and give guidance so that they can continue to be active. Those users that participated in a contest fall in a black hole afterwards. This is with all the contests volunteers from Wikimedia organised. But not just with contests, this is also the core problem why edit-a-thons and workshops in editing Wikipedia gives so little result in editor retention: it is going fine during the edit-a-thon and the workshop when they have a personal coach next to them to ask questions. But after the edit-a-thon or workshop, those users are on their own. Wikipedia is aiming on creating content, with all kinds of pages that try to support that. What Wikipedia misses is a social environment that fits with these people and gives them stimulation.

With past edit-a-thons I noticed also another strong need with the participants. After the editing they all are interested to continue doing this together as group. Then you can say that they can use a project page on Wikipedia, but that does not work for them at all. It appears way too primitive for them. (What works for one group does not work for the other group & vice versa.)

WMF is trying to push and pull on editor retention, but that is pulling a dead horse as long as the environment is not adapted to those the people.

And please understand me well: I consider it as the most important to get more people on board, but with the current environment it seems to me to be unrealistic to have as volunteers a big influence on this. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

Editor retention is a too simplified idea[edit]

As organiser of Wiki Loves Monuments in Belgium and Luxembourg, one of our main goals was to give the people in our region the possibility to participate in another way than only writing articles. The people in countries are way too much generalised with the subject of editor retention, like that people who take pictures would as easy also write articles. We consider such nonsense. We organised Wiki Loves Monuments (also) to give people who are not good in writing but can take pictures, the opportunity to help Wikipedia in the way they can, with photos. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

editor retention at commons would be uploading to the commons, during the other 11 months. however, retention is a cultural problem that will harm all efforts until there is a systemic change. dinging every effort because they do not solve this problem is an evasion. Slowking4 (talk) 16:21, 6 May 2015 (UTC)[reply]

I think it is worth comparing WLM with other photography contests. In particular the Geograph a project that is mostly in the UK and Ireland, and which has now collected over 4 million images, nearly half of which have been copied to Commons. Their inclusivity is far more extreme even than commons, and I doubt there is much desire there to import everything from that contest. However there are two key lessons I would take from comparing us with the Geograph, firstly if you operate all the time it is easier to build a community than if you only exist for one month of each year. Second there is much less organisational overhead if you operate like the Geograph or indeed Wikipedia and open to photographers being opportunistic - you may visit a village aiming to photograph the medieval church, but that doesn't stop Geographers taking photographs of a thatched cottage, vintage car and unusual farm animals on the way. If we want to recruit new people and get them to stay, then I think there is an interesting opportunity for Wikimedians in countries with similar Freedom of Panorama to the UK to create Geograph style projects on Commons, though maybe not giving "points" to pictures of muddy fields. WereSpielChequers (talk) 12:10, 14 May 2015 (UTC)[reply]

Well, did someone calculate "editor retention" at Geograph, or usage of their photos copied to Commons, so that they can be compared to WLM? Hard to tell whether there are lessons to learn. --Nemo 15:05, 14 May 2015 (UTC)[reply]

I'd be particularly interested in usage. I tried to calculate it using Glamorous a few times, but I think the number of files was simply too large for the tool. Richard Nevell (WMUK) (talk) 15:09, 15 May 2015 (UTC)[reply]

Yes, unfortunately, GLAMorous has some limitations to use for such a large category. A further challenge is that it is not easy to get a measure of unique image use otherwise, as is the case with Commons overall. It seems category size is also a limitation to using the Quarry tool to easily access category contributor usernames for running through Wikimetrics. As for comparing to other photo events more generally, while you have likely been there already, I just want be sure that other photo events captured in the reporting may be observed in the Other Photo Events report JAnstee (WMF) (talk) 16:02, 15 May 2015 (UTC)[reply]

@JAnstee (WMF):Quarry does work, you need to use underscores in category names, not spaces. I forked your query here ; and also run my usual “File usage” query. Jean-Fred (talk) 18:46, 15 May 2015 (UTC)[reply]

Awesome Jean-Fred - thanks for that - I shouldn't have done that so quickly and the fail zero result was disappointing - this is much better news!JAnstee (WMF) (talk) 19:15, 15 May 2015 (UTC)[reply]

@Nemo, I haven't calculated editor retention at the Geograph, I merely observe that they have so far collected about 4 million images in the United Kingdom, and that in working with Geograph images I'm conscious of some contributors who have been active for a long period documenting their area/interest. Usage is a different matter, their early images were subject to something like a 100kb cap per image, and while that has long been lifted it does make the older images they collected less likely to be used, and of course they have a much lower standard of usefulness than we do. But the main differences that I noticed were that they don't restrict themselves to one month of the year or just to a predefined target list, and of course they mainly operate in a country with a Freedom of Panorama law that is compatible with Commons. I think that those were the main reasons why they have collected such a huge number of images, and while we can't replicate the third of those in every country, I think we would benefit from applying the first two to any future photography contests. Of course others might look at the Geograph and spot differences that I haven't, but I hope you'll all agree that such a huge project is likely to have lessons for us. WereSpielChequers (talk) 09:57, 21 May 2015 (UTC)[reply]

WLM aims not only to collect images from the most popular monuments but from all monuments[edit]

Wiki Loves monuments is organised to have a full coverage of the worlds cultural heritage, not just of the most popular or easy monuments. It is a nice benefit that we broke the record of the world's largest photography competition, but that is not the core goal. The goal is to have the world's heritage covered in pictures on Wikipedia, which means a cumulative grow in number of photos with each contest. The comparison of the budget compared to the number of uploads is also strange, like there is a strong relationship between them. The number of uploads depends on so many parameters that are outside the control of an organising team, and so many depends on just having luck. Yes, luck, that is an underestimated parameter with any contest and with many other things on Wikipedia as well. The only thing an organising team can do, is doing their best they can.

Focussing on how much a photo costs is suggesting there is a direct relationship between the money spend and the number of uploads, while none of the uploads is paid. The donors' funds are being used to organise a large photography in what thousands of volunteers participate with taking and uploading photos to help Wikipedia improve. A heard argument elsewhere was that we are spending the money of donors. But ask then this question: do the donors want only the most popular monuments to be covered on Wikipedia, or all cultural heritage monuments? – I speak with many many people, including donors, and none of them expect us to cover only the most popular monuments. Everyone expects that Wikipedia has them all.

It is fine to evaluate a contest in comparison, but such should be done in comparison with other ways of getting the exact same results. The current set-up is too much focussed on the wrong goal. The goal is not spending less money. The goal is getting the best possible results for the money spend, to get the same results. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

Other results left out of the evaluation[edit]

The evaluation does describe the results that seems to be achieved on first hand, but misses the results beyond. Examples of missing results are:

In many countries Wiki Loves Monuments is a way for Wikipedia volunteers to group together and work together to get better results as result of this collaboration.
It also gives users from Wikipedia the possibility to learn and professionalize themselves.
We did get an overview of the cultural heritage of many countries on Wikipedia.
In many countries there are collaborations with cultural institutions.
There were many sponsors who donated money, time, effort to us as they consider Wiki Loves Monuments an important and valuable project.
Because of Wiki Loves Monuments, cultural institutions donated thousands of images, like 480,000+ images from the Rijksdienst voor het Cultureel Erfgoed.

In general saying, WMF has set some objectives for itself, and now the evaluate those objectives/goals, even while Wiki Loves Monuments has a different focus and aim. Romaine (talk) 09:04, 6 May 2015 (UTC)[reply]

Momentum[edit]

wow, 10% reuse of unique images! this is a big outcome of image drives, that may well be lost in the other discussion. how do we replicate that.

i am concerned that there is a loss of momentum, with 3 key individuals. how do we train interested people in how to run image drives? (stealth goal) i am concerned about the one-off project mindset, that makes replicability harder. (subtext dropout of WLM-USA) Slowking4 (talk) 16:17, 6 May 2015 (UTC)[reply]

Hi Slowking4, Thanks for your comments. I am not too familiar with the history of Wiki Loves Monuments. Can you say more about your being concerned of the a one-off project mindset and loss of momentum? Thanks! --EGalvez (WMF) (talk) 05:54, 14 May 2015 (UTC)[reply]

this is my summary of reading over old talk pages: WLM started in Europe to increase photos of public space, and have fun with a contest. each country had a contest organized by volunteers. in the USA, a small group organized for a while. but have not participated the last 2 years. my impression was that there were more organizers in prior years. it's a high work load to set up the image target lists and judging. to some extent people have moved to different targets such as wiki loves earth, but we could use annual updates to document over time. it's ad hoc, and there is not much planning / funding to sustain the effort over time. we need more and better images; this is a critical quality failing, that WMF / community should support more.

Hi Slowking4 Thanks for clarifying. In short, this is what this reports aims to do. Its a first step toward looking at the different implementations worldwide. But, we know that while there are many Wiki Loves Monuments implementations, they might all be very different in their process. So, what we do is conduct interviews with those who organized not only Wiki Loves Monuments, but other photo events to get a sense of the process that makes these events successful. What steps should a new organizer take? What works very well, what doesn't seem to work well? The interviews help to document the diversity of processes so organizers can learn best practices and can be better prepared if/when they request funding. Does this type of work fit into the support you were referring to? Thanks! --EGalvez (WMF) (talk) 20:52, 20 May 2015 (UTC)[reply]

yes thank you. if you have interviews, could you report some of the responses by theme, or desired outcome, or problems solved. it's good to highlight metrics, but the hard to measure may be getting lost. Slowking4 (talk) 18:32, 21 May 2015 (UTC)[reply]

Baselines[edit]

A comment on the Signpost article said "only 13% of images are used in any Wiki project and that only 0.03% became featured". But what are the baselines for image usage, Featured, Quality, and Valued images? How does Wiki Loves Monuments measure up against the rest of Commons? Can those baselines help calibrate expectations? Richard Nevell (WMUK) (talk) 11:13, 7 May 2015 (UTC)[reply]

Hello Richard, unfortunately we were not easily able to answer about unique image use for the Commons baseline for this time period. However, we looked at the rate of image uses to uploads as an alternative proxy to learn that for all of Commons the uses rate is about 1 use in namespace 0 to every 11 uploads for Commons over all time and for Commons files uploaded during the reporting time period, about 1 use to every 25 uploads (1:25). This compared to nearly 1 use to every five uploads for Wiki Loves Monuments (1:5) and more than one use to 4 uploads for other photo events (1:4) of the same time period.

The comparison point for number of uploads from the time period Featured on Commons was 0.04% for Commons overall, 0.03% for the captured Wiki Loves Monuments contests, and 0.05% for other photo events captured. JAnstee (WMF) (talk) 16:02, 15 May 2015 (UTC)[reply]