User:LZia (WMF)/Trip reports

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

I'm going to experiment writing trip reports for the conferences I attend where I have something to share. I'll monitor the pageviews to this page over time as well as the feedback I receive and the discussions around these reports to assess whether doing more of them is justified. :) --LZia (WMF) (talk) 00:51, 20 April 2017 (UTC)

Wiki Indaba 2018[edit]

Growing Wikipedia Across Languages - Wiki Indaba 2018 (1).pdf

I was planning to attend Wiki Indaba, March 16-18, 2018. My visa didn't arrive on time and I almost missed my talk on Growing Wikipedia Across Languages via Recommendations. A series of people came together and made it possible for me to give my talk remotely, and listen in the other two talks in Research Showcase session. I'm writing a short report based on the discussions in that session that went on for around 1 hour 40 minutes. [Please note that I may have missed some conversations given that I was not in the room. Others who were are welcome to correct me if I'm wrong or have missed something discussed.]

There were three main themes that were discussed in the session:

The role of Foundation towards smaller languages A series of questions came up in the session relating to the role of Wikimedia Foundation towards smaller languages. As a Foundation, we are asked by the African community attending Wiki Indaba to make our stance towards small languages clear. More specifically, we are asked to specify whether we want to do something specifically for small languages and communities or we want to continue providing platforms and means that every community can use and are not geared to the needs of smaller languages.

This is no easy question, but as Wikimedia Foundation, we will have to make an informed choice: Do we want to be in the position of helping people across the world save their local languages (and hence cultures) and by doing so move towards more Knowledge Equity? If so, we are talking about more than 2000 actively spoken languages, and we need to rethink how we approach these communities, languages, and their challenges, and our federated model to make efforts scale way beyond the couple of hundred languages we serve today. It is also possible that our answer is: our read of knowledge equity is different and does not include focusing on saving every language. As always, we have to make trade-offs.

Prioritizing content creation The closest conversation to the work we do in recommendation systems is this category of discussions that came up in the session. The question is: how should we prioritize missing articles that need to be created across languages, especially in smaller languages where you have very few contributors to the language? I heard a few times mention of ranking by predicted pageviews which matches the model we developed for this purpose (section 2.2. of There are two learnings from the conversations on this topic for me:

  • We need to double-down on the model we had in the paper and bring to our recommendation API and GapFinder.
  • In the Research team we have been discussing what a good measure for article importance is for GapFinder for quite some time. The more I discuss the prioritization task with editor community members, the more I am convinced that ranking by predicted pageview in the destination language is an okay prioritization, for at least certain language communities. When we have so many missing articles across our languages and when we allow people to search within missing articles to find what they are interested to contribute to, it is fine to rank articles by predicted pageview and allow people to search in the top, for example, 300K missing articles that have high predicted pageviews. Of course, for languages that are bigger, we can always think about other ways to prioritize missing content. Diego has some nice ideas about that that I look forward to discuss with him further.

Research in Africa The question is: What is our strategy for increasing research activity on Wikimedia projects in Africa? What can we do to inspire, empower, and train the next generation of researchers in Africa?

This is an important question and something I hope that my team, Research, in collaboration with other teams in WMF can start thinking about. This is, of course, no easy challenge. The challenge of not having many researchers in Africa focused on research about Wikimedia projects will need to be understood within the broader context of research and research opportunities in Africa for African researchers. In order to be able to take a meaningful step in this direction we need to:

  • Understand the challenges and needs of African researchers, especially in fields such as computer science, social sciences, economics, etc.
  • We need to understand how to incentivize the researchers to focus on Wikimedia projects: Scholarships, access to publicly available data, access to computational resources, defining interesting research questions, etc. can all be meaningful incentives.
  • Raise the profile of research on Wikimedia projects in Africa.

AI for Good Summit 2017[edit]

AI for Good Summit 2017 took place on June 7-9, 2017 in Geneva, Switzerland, at International Telecommunication Union's campus. The summit was organized at the request of António Guterres, Secretary-General of the United Nations, who is interested in the application of artificial intelligence in addressing the 17 goals listed under Sustainable Development Goals. From Wikimedia Foundation, Dario Taraborelli, Katherine Maher and myself attended the summit.

A few pointers:

Some other things to share (in no specific order):

  • The summit organizers had done a really good job in creating a very diverse pool of attendees to the summit. It was truly incredible to see that if one spends efforts and focuses on increasing diversity, one /can/ make a very noticeable difference. You may be able to spot this in the photos, too. This is the first technical summit I've attended where there was a noticeable representation of women, people of color, and many other minorities. Really well done! :)
  • If you are interested in research questions in the space of social good, I recommend watching all plenary sessions, but especially those on the first day where people from different agencies within the United Nations shared their fears and hopes for the use of AI within their specific field. These presentations are very high level, but they provide a broad overview of the type of challenges these organizations deal with.
  • Before going to the summit, I refreshed my mind about Jon Kleinberg et al.'s piece in Harvard Business Review on the use of Machine Learning to solve social problems. I did this as summits like this, by design, may have the tendency to focus on one tool to solve all problems. :) Peter Norvig reminded us of this issue in Plenary 12 when he talked about AI in comparison to the other tools available to us to do Good and I found that a welcoming reminder. He paraphrased Patrick Winston's way of explaining AI: "... AI is like the raisins in raisin bread. The raisin bread is mostly just bread, that's the core stuff and then the raisins are there to make it more exciting. And I think when we look at problem solving for Good, AI can be a component to that, as can any of the appropriate tools. You just have to look at what tools are appropriate and sometimes that is going to be AI ...".
  • I was on the Social Good Data panel to discuss with colleagues and engage the audience in the breakthrough session to identify 1-2 guidelines in the context of social good data. (Each breakthrough session was instructed to come up with 1-2 guidelines that were reported as part of the last plenary session of every day. To my understanding, the goal of requiring guidelines was to provide some structure to be able to follow up specific conversations after the summit.)
  • I got an invitation to the partnership dinner on June 8, thanks to Katherine, and I think Chris. :) Chris set the tone for the night while we were enjoying Eritrean food made and served in a very cozy setting, by a family who had migrated from Eritrea to Switzerland some decades ago. He talked with us about Lusie, an African girl who attended UNICEF-built schools in Africa but then was found by Chris some years later sieving for gold on the side of the street (and no longer at school). Chris engages in a conversation with Lusie over what she's doing and what the gold she's looking for looks like. The responses he receives is indicative for him that the school has failed Lusie, that the school has taught her many things but not the things that she needed the most given her environment and the limited years she had had available to her to attend school. Chris left us with this question: Suppose you have a window of opportunity to inject one shot to a child's body and that shot contains the educational content that will help the child prosper. What would the shot contain?
  • What's ahead for me is mostly to explore the space in the coming months to understand how/if we can pick up some of the conversations started (for example, with UNICEF Innovation Ventures) in the summit, pick one or more specific areas, and contribute more directly to some of the challenges identified in the summit.

Perth, Australia, April 2017[edit]

WWW2017 conference took place in Perth, Australia, April 3-7, 2017. I attended the conference and here is some highlights of this stay in Perth you may be interested in.
April 3: As I was attending Wikimedia Conference in Berlin until April 2, I missed a good chunk of WWW's first day of workshops. It was great to spend as much time as possible with Wikimedians in Berlin, but I'm sad that I missed TempWeb2017.
April 4: This was the fourth time we were co-organizing Wiki Workshop with our colleagues at EPFL and Stanford University. There was quite a bit of excitement (and anxiety;) for us as organizers since we had built a nice momentum in the past years around the workshop but we were uncertain how many people will attend in Perth, given the complexity of travelling to Perth. I must say: our crowd did not disappoint. ;) We had 50-60 people in the room at all points during the day.

  • The audience was a nice mix of new and old people. We had some of our present and past formal collaborators, as well as PhD students who are picking up topics about Wikimedia projects, professors from across the field, and of course, local people (for example, from the health-care sector).
  • In total, we had 11 poster presentations. You can read more about some of the papers behind the posters if you're intrested.
  • The speakers' talks were very engaging and spanned topics from entity extraction, to bias, conversation dynamics, and of course, classes and categories in Wikipedia. Feel free to check out the abstracts of their talks.
  • And last but not least: Katherine made an intro video for the workshop that was great for getting started. (We usually try to have someone from Wikimedia in the workshop but the conference being right after Wikimedia Conference and so far made it quite hard to make physical presence happen.)

April 5: First off, it was a bit hard to figure out which paper was being presented exactly when/where. At some point, thanks to program chairs, I got access to this place where you can see all the information you would need to orient yourself. Thanks, Evgeniy and Eugene! :)

  • I attended Population-Scale Study of Sleep and Performance. Tim is a great presenter and this research was hands down fascinating, given the scale of the research and innovation involved to measure cognitive performance at scale. I highly recommend you check out the paper if you are interested in understanding the impact of sleep on cognitive performance.
  • I also attended PhD Symposium to listen to the presentation and provide feedback to one specific research by a PhD student. If you have never attended a PhD symposium: as a mentor, you're assigned to a student and you agree on attending the presentation by the student, read their work ahead of time, and provide feedback on the work but also ideas on how the project can be expanded, the type of classes the student can take to advance his/her skills, etc. I was assigned to Christoph Hube (L3S Research Center, Germany) who is working on his PhD thesis on the topic of Bias in Wikipedia. I'm looking forward to see how this important research shapes in the coming years. :)
  • As part of the Industry Track, I attended Understanding Online Collection Growth Over Time: Case Study of Pinterest where Caroline described the research they did on Pinterest's boards growth and dynamics. This talk was interesting to me as it was closely related to Gather feature developed by the Reading team. I was curious to see the commonalities between the spaces Wikimedia operates in and Pinterest. Jkatz_(WMF) you may be interested in looking into this research further.
  • And, I had Understanding Short-term Changes in Online Activity Sessions in my list of talks to attend but I missed it as I was attending PhD Symposium. I read the paper afterwards and highly recommend it to those interested in learning more about reader behavior (on Wikimedia projects or not). The same is true for Predicting Intent Using Activity Logs, still in my to-read list as it's nicely related to the work we did on understanding Wikipedia readers.

April 6:

  • This day started pretty early. On top of all the annual planning work that had to be done throughout the night Australia time ;), I was attending a breakfast panel organized by Business News to provide exposure to some of the large and impactful companies/organizations that had their researchers in Perth for WWW2017. I was invited to this panel along with Evgeniy Gabrilovich (Google), Xin Fu (LinkedIn), and Krassi Hristova (Snap Inc.). We shared insights about the organizations we work in (how does the culture look like?, for example), why we have chosen specific research areas we focus on (me talking about knowledge gaps, Evgeniy about digital health, etc.), and more.
  • In the afternoon, I was the chair of Crowdsourcing 2 session. If you are short in time, I recommend at the very least checking Srijan Kumar, et al.'s research on sockpuppet detection: An Army of Me: Sockpuppets in Online Discussion Communities. Srijan has already done quite some work on hoax detection with us as well and I'm looking forward to working more closely with him and Bob West in these spaces soon. :)

April 7:

  • I attended all talks scheduled in Social 4 session which I thoroughly enjoyed and learned from. If you are interested in conversation dynamics, the issue of confidence and over-confidence in group dynamics, quality of discussions and how they change over time, and competition and selection (in the academic world;), check out these publications: Discussion quality diffuses in the digital public square, When Confidence and Competence Collide, and Competition and Selection Among Conventions.
  • Philipp Singer gave a talk on our research Why We Read Wikipedia. The talk was the last talk of the last session of the conference and I was pleasantly surprised to see a room of 50+ people who stayed around to listen to the talk and engage with us about this research. :)
  • Friday was a shorter day in the conference as it was the last day of it. However, we made sure the fun continues for the rest of the evening. Bob West, Nithum Thain, Tiziano Piccardi (all our formal collaborators working on an array of projects from building recommendation systems for article expansion to anti-harassment) and I attended an event graciously organized by Wikimedia Australia. :) We were all quite excited to meet each other in person. There, we heard about the stories of Wikimedia Australia editors and staff who were attending the event, and shared some of the research projects that we're working on that could be of interest to the attendees. This was a short 3-hour event that left us with only the best of feelings. Thanks to Wikimedia Australia for organizing it! :)

April 8: On Saturday morning and before we start a real weekend ;), James Shanahan, Ricardo Baeza-Yates, and myself attended a meeting with IW3C2 committee to listen to the report of the Perth team on the 2017 conference, provide a status update on our progress organizing the Web Conference 2019 in the Bay Area, and check out the bids by Taiwan and Thailand for the Web Conference 2020. This was a neat meeting, but I can tell you that we were all ready to hit the road after one long and positively intense week. ;)

This was my second year attending WWW and the conference and its community feels more and more a natural place for the research that I'm involved in. The conference by design is at the interface of many fields in computer science which helps you have exposure to many different topics in one go. The quality of the papers and presentations were very high, and I did enjoy the opportunity for connecting with old colleagues and getting to know new ones as well as exchanging ideas about research projects we are all involved in (or wish to get more involved in). I'm looking forward to the now newly branded "Web Conference 2018" in Lyon, France. :)