Research:Ethical and human-centered AI/Process proposals

Overview and background[edit]

This page contains a list of methods that can be included in the design process for ML products (including recommender systems) in order to prevent, identify and address issues of harmful bias and other unintended consequences of these systems. The proposals included here have been gleaned from a review of a set of process frameworks developed by various NGOs and industry-academic partnership organizations, and supplemented with guidance and insights from the scientific literature.

The proposals described here were selected based on the following criteria:

Feasibility: the proposal could be implemented (or at least piloted) within the context of a Wikimedia Foundation product development process with current resources and institutional knowledge
Impact: the proposal has the potential to effectively address real or anticipated issues that may arise in WMF ML-product development
Precedent: the proposal aligns with, extends, or formalizes current practices that have already been tried and shown promise in WMF or Wikimedia Movement product development
Fit-to-strategy: the proposal is clearly aligned with values, methods, or strategic priorities of WMF and the Wikimedia Movement

Guiding questions[edit]

What does a Minimum Viable Process for Ethical AI at Wikimedia look like?
How can Wikimedia work towards Ethical AI design goals across team/department/organization boundaries?

Guiding concepts[edit]

Wikimedia should follow ethical and human-centered design principles when developing AI products.

An ethical AI product, and the process for developing that product, should adhere to principles of fairness, transparency, and accountability.

Fair: the system doesn’t cause harm through active or passive discrimination.
Transparent: intended audiences can meaningfully understand what the system is for, how it works in general, and how specific decisions were made.
Accountable: the rights and responsibilities of all stakeholders are clearly defined and everyone involved has the information and tools necessary to exercise their rights and responsibilities.

A human-centered AI product should be adhere to human-centered design principles.^[1] A human-centered product is...

Based on an analysis of human tasks
Designed to address human needs
Built to account for human skills
Evaluated in terms of human benefit

AI products at Wikimedia[edit]

Wikimedia develops a variety of products that are relevant to machine learning, including machine learning algorithms themselves, as well as products used to develop and deploy those algorithms.

Machine learning models: algorithms that uses patterns in data to make predictions about the characteristics of different data. Example: Edit quality
Curated datasets: data collected or labeled to train machine learning models. Example: Detox
ML platforms: cloud computing services that host and provide programmatic access to machine learning data and models. Example: ORES
ML-driven applications: end-user facing apps, gadgets, and features powered by machine learning models. Example: GapFinder, Edit review filters
Data labeling applications: interfaces for humans to annotate input and output data from machine learning models and ML-driven applications. Examples: Wiki labels, JADE

Discussion[edit]

audience, purpose, context
subject-matter experts
prior knowledge (background research, competitor analysis, post-mortems of previous projects)
external stakeholders
alignment with org and movement values, strategic goals, ethical and human centered AI principles
roles & responsibilities

Documentation[edit]

Impact statements[edit]

"In order to ensure their adherence to [principles of accountable algorithms] and to publicly commit to associated best practices, we propose that algorithm creators develop a Social Impact Statement using the above principles as a guiding structure. This statement should be revisited and reassessed (at least) three times during the design and development process: design stage, pre-launch, and post-launch. When the system is launched, the statement should be made public as a form of transparency so that the public has expectations for social impact of the system.”^[2]

Impact statements articulate design goals, outline risks, and define roles, responsibilities, and success criteria.

They contain many of the same elements you would expect in a normal project plan, but with an increased emphasis on accountability.

They are intended to be public documents, and are written for an external audience (inside and outside the organization), but similar to checklists, the process of developing the impact statement also helps shape the product team's thinking and orient their actions towards ethical and human-centered design principles.
They are collaboratively-created, and include input from all team members. They may also include input from important stakeholders outside the team, such as subject matter experts and potential end-users of the product.
They follow a standard format to assure consistency and coverage across a range of ethical and human-centered principles and practices.
They are living documents that are revised throughout the design process.

Examples

Datasets used to train machine learning models can contain inherent biases and limitations that have an impact on the fairness and usefulness of algorithmic decisions. Researchers have advocated for better documentation standards for datasets that describe the motivation, provenance, characteristics, and limitations of training datasets in order to avoid unintended consequences.^[3]^[4]^[5]^[6]

Checklists[edit]

"“Checklists connect principle to practice. Everyone knows to scrub down before the operation. That's the principle. But if you have to check a box on a form after you've done it, you're not likely to forget. That's the practice. And checklists aren't one-shots. A checklist isn’t something you read once at some initiation ceremony; a checklist is something you work through with every procedure.”^[7]

Checklists are lists of necessary steps in the design process that must be completed, usually in sequence.

They help teams align ethical and human-centered design goals and principles to practice.
They ensure that organizationally-sanctioned (or legally mandated) processes are followed consistently across teams and over time.
They provide a way for less powerful stakeholders to flag issues without fear of reprisal.
They don't have to (just?) include concrete tasks and benchmarks to ensure compliance. They can also consist of "probe questions" that ensure that the product team considers important issues (like fairness and unintended consequences) at particular points in the design process.

Examples

The UK Government's Data Ethics Framework
DJ Patil's sample data science checklist questions
DrivenData's DEON ethics checklist command line tool

Iteration[edit]

Prototyping and User studies[edit]

"Understanding how people actually interact—and want to interact—with machine learning systems is critical to designing systems that people can use effectively. Exploring interaction techniques through user studies can reveal gaps in a designer’s assumptions about their end- users and may suggest new algorithmic solutions. In some of the cases we reviewed, people naturally violated assumptions of the machine learning algorithm or were unwilling to comply with them. Other cases demonstrated that user studies can lead to helpful insights about the types of input and output that interfaces for interactive machine learning should support."^[8]

Prototypes are early-stage or simplified versions of a machine learning model or ML-driven application that are developed specifically for use in user studies.

Using prototypes make iterative improvement of a design quicker and less costly and encourages product teams to base design decisions on evidence.
A fully-realized user interface is not always necessary for prototyping a machine learning model. Participants or crowd workers can be asked to evaluate machine learning output for features such as bias, accuracy, or usefulness using simple scoring or survey tools (or even spreadsheets).
A machine learning model is not always necessary for prototyping an ML-driven application. Algorithmic decisions can be simulated with pre-generated response sets or by human confederates using a Wizard of Oz approach.

Explanation[edit]

Model interpretability[edit]

UI explanations[edit]

Evaluation[edit]

Pilot testing[edit]

Success metrics[edit]

Monitoring[edit]

External auditing[edit]

Feedback[edit]

References[edit]

↑ Kling, Rob; Star, Susan Leigh (1998-3). "Human Centered Systems in the Perspective of Organizational and Social Informatics". SIGCAS Comput. Soc. 28 (1): 22–29. ISSN 0095-2737. doi:10.1145/277351.277356. Check date values in: |date= (help)
↑ "Principles for Accountable Algorithms and a Social Impact Statement for Algorithms". fatml.org. Retrieved 2018-12-17.
↑ Hind, Michael; Mehta, Sameep; Mojsilovic, Aleksandra; Nair, Ravi; Ramamurthy, Karthikeyan Natesan; Olteanu, Alexandra; Varshney, Kush R. (2018-08-22). "Increasing Trust in AI Services through Supplier's Declarations of Conformity". arXiv:1808.07261 [cs].
↑ Holland, Sarah; Hosny, Ahmed; Newman, Sarah; Joseph, Joshua; Chmielinski, Kasia (2018-05-09). "The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards". arXiv:1805.03677 [cs].
↑ Gebru, Timnit; Morgenstern, Jamie; Vecchione, Briana; Vaughan, Jennifer Wortman; Wallach, Hanna; Daumeé III, Hal; Crawford, Kate (2018-03-23). "Datasheets for Datasets". arXiv:1803.09010 [cs].
↑ Bender, E. M., & Friedman, B. (2018). Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science. Transactions of the ACL. Retrieved from https://openreview.net/forum?id=By4oPeX9f
↑ Patil, DJ (2018-07-17). "Of oaths and checklists". O'Reilly Media. Retrieved 2018-12-17.
↑ Kulesza, Todd; Knox, William Bradley; Cakmak, Maya; Amershi, Saleema (2014-12-22). "Power to the People: The Role of Humans in Interactive Machine Learning". AI Magazine 35 (4): 105–120. ISSN 2371-9621. doi:10.1609/aimag.v35i4.2513.

[1] Kling, Rob; Star, Susan Leigh (1998-3). "Human Centered Systems in the Perspective of Organizational and Social Informatics". SIGCAS Comput. Soc. 28 (1): 22–29. ISSN 0095-2737. doi:10.1145/277351.277356. Check date values in: |date= (help)

[2] "Principles for Accountable Algorithms and a Social Impact Statement for Algorithms". fatml.org. Retrieved 2018-12-17.

[3] Hind, Michael; Mehta, Sameep; Mojsilovic, Aleksandra; Nair, Ravi; Ramamurthy, Karthikeyan Natesan; Olteanu, Alexandra; Varshney, Kush R. (2018-08-22). "Increasing Trust in AI Services through Supplier's Declarations of Conformity". arXiv:1808.07261 [cs].

[4] Holland, Sarah; Hosny, Ahmed; Newman, Sarah; Joseph, Joshua; Chmielinski, Kasia (2018-05-09). "The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards". arXiv:1805.03677 [cs].

[5] Gebru, Timnit; Morgenstern, Jamie; Vecchione, Briana; Vaughan, Jennifer Wortman; Wallach, Hanna; Daumeé III, Hal; Crawford, Kate (2018-03-23). "Datasheets for Datasets". arXiv:1803.09010 [cs].

[6] Bender, E. M., & Friedman, B. (2018). Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science. Transactions of the ACL. Retrieved from https://openreview.net/forum?id=By4oPeX9f

[7] Patil, DJ (2018-07-17). "Of oaths and checklists". O'Reilly Media. Retrieved 2018-12-17.

[8] Kulesza, Todd; Knox, William Bradley; Cakmak, Maya; Amershi, Saleema (2014-12-22). "Power to the People: The Role of Humans in Interactive Machine Learning". AI Magazine 35 (4): 105–120. ISSN 2371-9621. doi:10.1609/aimag.v35i4.2513.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]