Grants:Project/Maximilianklein/humaniki/Midpoint

This project is funded by a Project Grant

Report accepted

This midpoint report for a Project Grant approved in FY 2019-20 has been reviewed and accepted by the Wikimedia Foundation.

To read the approved grant submission describing the plan for this project, please visit Grants:Project/Maximilianklein/humaniki.
You may still review or add to the discussion about this report on its talk page.
You are welcome to email projectgrantswikimedia.org at any time if you have questions or concerns about this report.

Welcome to this project's midpoint report! This report shares progress and learning from the grantee's first half of the grant period.

Summary[edit]

In a few short sentences or bullet points, give the main highlights of what happened with your project so far.

User Research

Completed generative user research in which we conducted participatory design activities with community members.
Synthesized research findings and conducted team brainstorming sessions to narrow down set of features for development depending on the grant timeline.
Strategized next steps for user testing the alpha and beta releases.

Software development

Defined the data-schemas and interface-guidelines for an extensible version of the humaniki tool.
Started coding our data-processing layer, the http API and three interactive visualizations for gender by date of birth/project language, and citizenship.

Methods and activities[edit]

How have you setup your project, and what work has been completed so far? Describe how you've setup your experiment or pilot, sharing your key focuses so far and including links to any background research or past learning that has guided your decisions. List and describe the activities you've undertaken as part of your project to this point.

Hiring outcomes[edit]

The first order of business for humaniki was to hire a team. From discussions with advisors and the grants team, the positions we hired for were "Community UX Researcher"- to engage the community in the development process; and "Frontend Engineer"- to ensure the web application was snappy, matching humaniki's existing data-engineering talent. We posted the position on several Wikimedia and UX Mailing Lists, on the talk pages of some diversity focused Wikiprojects, in the #Jobs channel of two Social good technology Slack Channels. We followed the process of

Screening email applications for resumes that minimum qualifications.
Conducted 30 minute intro calls with each candidate, to assess top-qualified.
Giving the top-2 of each position a 1-hour mini-challenge to assess their skills.
Picking among the mini-challenge submissions

The below table shows the results of the hiring process

Hiring Process Results
Position	Num-applicants	Number Inputs	Number making it to second round	Final Hire
Community UX researcher	15	6	2	Sejal Khatri
Frontend Engineer	4	3	2	Eugenia Kim

User Research[edit]

Literature review[edit]

See Google Doc Link(10 pages)

A document that encompasses information about relevant user groups - Wikiprojects that tackle systemic bias, diversity tools already in use in the community, a list of previous feedback and suggested features for WHGI and Denelezh. Along with that, we include UX research approach and branding guidelines for humaniki.

Community Outreach[edit]

We used a screener survey to recruit interview participants for participatory design activities. We reached out to the communities through various communication channels, including Wikiproject talk and news pages, sending direct emails, and posting on social media channels. We got a 54% response rate for the interview participation from our survey respondents (33% from Men, 90% from Women).

HCI Activities[edit]

Participatory design - Open card sorting activity[edit]

In the interviews with editors, we conducted classic co-design activities like open card sorting where they suggested new diversity statistics that should be tracked along with ways in which current statistics can be made more usable.

Brainstorming workshop with the team (with tweaked version of NOW WOW HOW Matrix)[edit]

After completing the data gathering, we organized an online brainstorming workshop to finalize a set of deliverable features based on grant timeline. We converged our ideas by using a tweaked version of the Centre for Development of Creative Thinking’s matrix (COCD box), which allowed us to select the final feature list based on ease of implementation and impact of the feature.

Engineering[edit]

Planning[edit]

While we used research to elicit what additional features we wanted to develop. We knew that for our first technical step, we had to merge the existing WHGI and Denelezh codebases. In order to do that we identified what pieces could remain and which would need to be overhauled. To do this the engineering team identified the tech stack we'd use.

The tech stack, from bottom to top includes:
- re-using our Wikidata Processing at the ingest layer
- Updating our schema at the databases layer
- Creating a new web server at the web-backend layer
- Heavily updating the web API layer
- Creating a new interactive javascript layer

The biggest parts to get right at the beginning were definitions of the Schema and API.

Schema documentation
- Link to schema work: https://docs.google.com/document/d/1ng0l6uN0-Tb1jcUJ4bnCqKA2A2eYcJvvm8YdW-wBlAI/edit#
- The database schema is a crucial part to get right. It was developed in collaboration between the two data engineers. It defines 15 tables and relations in order to compute metrics and lists about humans along a number of dimensions. The schema was defined to be flexible enough so that we can start tracking new attributes of humans (e.g. religion) without any schema modification. It also facilitates making it easy to track the evolution of metrics.
API documentation
- Link to API work: https://docs.google.com/document/d/1ng0l6uN0-Tb1jcUJ4bnCqKA2A2eYcJvvm8YdW-wBlAI/edit#
- The API contract defines how applications will receive the data needed from the database, most significantly, the humaniki interactive web-app we are developing. The API was developed as a collaboration between the data engineer and the frontend engineer. It defines four routes, information about the gender gap, the evolutions of those gaps, lists of humans based on criteria, and metadata.

Implementation[edit]

In the first half on the project we started coding humaniki, implementing the technology stack in the following languages:

Data ingest
- Technologies: Java/wikidata-toolkit bash
- Link: https://framagit.org/wikimedia-france/denelezh-import
- Milestones achieved: code reviewed. This is the most mature part of the project that can be almost entirely re-used, so it we haven't prioritized it development in the first half.
Data processing
- Technologies: python-sqlalchemy, mysql
- Link: https://github.com/notconfusing/humaniki-schema
- Milestones: tested generation to produce example data. This layer is a rewrite and merge of code from the two previous projects. It's purpose is to take the data ingest layer and calculate the actual biographical metrics.
Backend
- Technologies: python-flask, sqlalchemy, pandas
- Link: https://github.com/notconfusing/humaniki-backend
- Milestones: fully-featured "gap" route, with tests. This layer is new for both projects. On it's face it will serve biographical metrics in a json-format to the web-app, but also to other future applications.
Frontend
- Technologies: Javascript/react, Javascript/d3
- Link: https://github.com/theeugeniakim/humaniki
- Milestones: 3 interactive visualizations. This part of the stack makes the data clickable and viewable to the end-user in a visually pleasing way.

Midpoint outcomes[edit]

What are the results of your project or any experiments you’ve worked on so far? Please discuss anything you have created or changed (organized, built, grown, etc) as a result of your project to date.

User Research[edit]

Merger sketches / design
Women Biographies by Country View
- We sketched out preliminary UI designs on Figma for humaniki that were a combination of the previous websites. We also took cues from a backlog of community feedback and latest research to inform our designs.
- For instance, one of the proposed design changes in the data visualizations is to provide users the flexibility to select sub-categories of data, i.e. a comparative view of subsets of countries while interacting with gender by country map. Updates like these will give users more autonomy over understanding trends in data about humans on Wikimedia projects. Another idea is to allow the user to scroll backward in the Wikidata snapshot that is used. These changes will help us meet user needs from different wikiprojects with different goals.
- Further details on design approach and how we plan to color code gender on humaniki: Branding guideline and research approach

UX Generative Research - User Research Report
humaniki Project Process Flow
- We conducted a generative research with the goal of identifying community needs in the diversity space and identifying integration opportunities. This report entails research background, methodology, and findings that will inform humaniki’s future development process. We elicited 10 feature requirements from our research and shortlisted 4 features for MVP based on grant timeline.

Engineering[edit]

Our first alpha-version of the site is viewable at: https://humaniki-staging.wmflabs.org/ .

The site is under construction and may have major bugs, for reporting purpose we show some screenshots of a alpha version of the humaniki web-app. The application is getting it's data successfully through the tech stack, but not systematically updated yet.

Landing Page Outline

The Landing Page section which will introduce the project and fetch a high level overview of the total breakdown of Wikipedia data and the number of biographies contained in the database.

Language View Tentative Graph

The Tentative Graph outline for the Language view pulls in all aggregated wikipedia language data from the Python Flask backend and displays the graph and table data. Each Language wikipedia Project is represented by a dot on the scatter plot and a line in the table which displays the total number of biographies for each gender by language project. When you hover over the scatter plot, label displays with the same information as shown in the table

Gender by Country View

The Gender By Country view displays a world map that is color coded by the highest percentage of Women per country. Countries with the highest percentage of women are a darker hue of purple. The data is also displayed in table form below the world map where a user is able to zoom in on different areas. The page also allows for you to hover over the country to view the breakdown gender biographies in that specific country.

Advanced Search View

The advanced search view enables users to use custom data filters for all the data points available simultaneously. Data points include the global gender gap, gender by year of birth, country, occupation, and Wikimedia project.

Finances[edit]

Please take some time to update the table in your project finances page. Check that you’ve listed all approved and actual expenditures as instructed. If there are differences between the planned and actual use of funds, please use the column provided there to explain them.

Then, answer the following question here: Have you spent your funds according to plan so far? Please briefly describe any major changes to budget or expenditures that you anticipate for the second half of your project.

We have spent our funds mostly according to plan in the first half, however in the second half we would like to shift more money into the UX Researcher and Frontend Engineer roles.

Budget change request[edit]

As currently approved, the Frontend Engineer (Eugenia Kim) is scheduled to earn $10,000 and the UX Researcher (Sejal Khatri) $7,000. This requested change is to increase both these roles to $13,000 each. That represents a $9,000 increase in wages. The $9,000 would be found by reducing the Data Engineering position (Max Klein) from $10,000 to $6,000 - saving $4,000; and repurposing $5,000 of the $5,500 travel budget to online community engagement (blogs, videos, webinars), for which the UX Researcher would be responsible to complete.

Rationale[edit]

The project is taking longer to complete than expected, as many software projects do. Since this is a passion project for Max Klein (the grantee and data engineer) he is willing to sacrifice part of his wages to pay for an extra month of the rest of the team's effort, to ensure the project's success. Additionally, the conference travel budget cannot be used due to COVID, but its spirit can continue through online community engagement efforts. The UX Researcher possesses the knowledge and skills to engage the community through writing, and virtual communication and so can utilize travel budget still to advertise humaniki to the movement.

Learning[edit]

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you are taking enough risks to learn something really interesting! Please use the below sections to describe what is working and what you plan to change for the second half of your project.

What are the challenges[edit]

What challenges or obstacles have you encountered? What will you do differently going forward? Please list these as short bullet points.

Some challenging facets from user research:
- recruiting a diverse group of users with equal representation from non-english speaking and english speaking community members. Due to this, our findings cannot be widely generalized to all user groups. Although, we tried to address this by speaking with representatives from non-english groups who would have a better understanding of challenges faced by their local communities.
- organizing focus groups participatory design sessions with two or more community members. It was challenging to group screener survey respondents based on their background and goals. Due to this, we structured our design activities suitable for both one-one activities as well as group activities.
Process
- We wanted to gather community input before building, and yet we couldn't wait for community input to start software development. The solution to this was to have initial research and design outputs ready for the engineering team to build a base prototype, while full research was being conducted. We could have started this unblocking earlier.
- The role for Max was to be both the project manager and the developer. At the beginning the project management tasks took up most of my time and distracted from a fast start on development. It took me (Max) some time to realize balancing methods, for instance dedicating the mornings to coding, and the evenings to meetings.
Software
- The data schema is very hard to get right, and takes several iterations to get right. Next time we should build a list of use cases and sample code that could be "mentally tested" by the team.
- Developing in programming languages/technologies in which you aren't an expert were difficult. For instance our front-end engineer had not worked with D3 before, and we didn't take the moment to look at alternatives.
- Despite the fact that we had a roadmap laid out for the project at the beginning. We didn't name and create sub-goals until about ⅓ of the way through with the introduction of our alpha and beta goals. It would be good to introduce 1 or 2 mini-goals per month to keep progress on track.

COVID
- Working completely remotely with a new team structure across different time zones took some getting used to initially. Next time we think it would be good to have a whole-team initial meet-up, using a "bubble" model to limit COVID risk.
- The COVID transition to fully-remote work was new, and we didn't realize the importance of the home office set-up, like external monitors.

What is working well[edit]

What have you found works best so far? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

Community Engagement[edit]

We received a 54% participation response rate for our interview research which showed the community's high interest in engaging with our project.
We got an opportunity to work with the Wikimedia research team who were also engaging actively in the similar problem space.
Since some of our team members are long term Wikipedians, we were able to follow a defined process to involve the interested community members in the project research and development. This helped us establish effective communication between different stakeholders.
Having access to diff.wikimedia, Wikimedia’s official news channel, helped us put our project in front of a large audience, and in getting exposure to community members.

Team Support and Workflow[edit]

Hiring with example problems was useful. Got the right people for the job.
Sharing Facilitation responsibilities: Collaborating with the team members and sharing research interview facilitation responsibility with the senior researcher.
Weekly Scrum: Active discussions about the project workflow and updating the strategy based on the updated workflow
Collaborative UX/UI exercises both with the team and seeing feedback from users helped to nail down concrete direction which helped to ground me on different features/prioritizing

Technical[edit]

LP VPS gives us the option to use our old mysql 8.0 tech stack, for free. not worry about space constraints.
Pair programming has allowed us to learn significantly more about different parts of the development process

Your learning pattern link goes here

We endorse the following two learning patterns:

Next steps and opportunities[edit]

What are the next steps and opportunities you’ll be focusing on for the second half of your project? Please list these as short bullet points.

Our two goals for the second half:
- Alpha Release
  - Get the full software stack running head-to-toe on Wikimedia servers.
- Beta Features Designed, then released (MVP)
  - Designing the elicited features and creating a working prototype: We plan to implement designs for the above identified features and deliver mockups and prototypes that can further be tested with the community. We plan to employ co-designing exercises using the Five Design Sheets method with the engineers, scientists and designers in the team to create effective visualizations to best address the identified use cases.

User testing plan.
- Alpha testing: Test the functioning website with a technical team to get feedback on architecture and technical design.
  - Architecture (schema and API review)
  - All features are working (unit and integration testing)
  - Maintenance discussion
  - Future use case discussions
- Beta testing: Testing the usability of the elicited features with community members using mid-fi prototypes.
- Launch and advertising
  - Create demo videos and publish blog posts.
  - Invite community members to participate in future development efforts.

Grantee reflection[edit]

We’d love to hear any thoughts you have on how the experience of being a grantee has been so far. What is one thing that surprised you, or that you particularly enjoyed from the past 3 months?

Extra difficult during COVID. Grant work is already somewhat precarious as work, but with the more precarious backdrop of the world it's harder.
Also work that involves lots of community involvement is challenging today because for whatever reason scheduling has become more difficult too.
It's fantastic that Wikimedia has decided to support data feminism at this level and that makes showing up to work each day very motivating.
Chatting with community members is enchanting.
Integrating new technology talent to the Wikimedia movement feels like a win-win.