# Learning patterns/Bytes added to or removed from Wikimedia projects

problemThe amount of content that a project has contributed to a Wikimedia project can be hard to measure.
created on26 August, 2014

 Global Metrics This learning pattern one of seven Global metrics. To get started with Global metrics, read the learning pattern Calculating global metrics. To learn what's behind Global metrics, read the overview page.

## What problem does this solve?

This learning pattern solves the problem of how to measure the number of bytes added to and deleted from Wikimedia projects.

### Context

Important contributions often involve adding, deleting or re-writing content. This metric attempts to capture content added and content removed. The primary advantage of using bytes to measure activity (and to some extent impact) is that it is a nearly universal metric. All Wikimedia projects are wikis, most forms of contribution involve editing these wikis, and almost every wiki edit adds or removes some number of bytes of content. Generally, one byte is one character for roman languages. For script languages, two bytes is one character. Images and media also use bytes, but they are excluded in this metric because media files and images come in just about any number of bytes, making the metric no longer useful.

## What is the solution?

### Some definitions

Bytes added is a quantity of data uploaded to Wikimedia. (See bytes)
Bytes removed is the quantity of information removed from Wikimedia.
Net sum is the number of bytes added minus the number of bytes removed. For example, (10 bytes added) - (10 bytes removed) = Zero.
Absolute sum is the number of bytes added plus the number of bytes removed. For example, (10 bytes added) + (10 bytes removed) = 20 absolute bytes. This is the metric we will be using.

### How to measure

#### Wikimetrics

• If you have not done so already, Take the Wikimetrics Learning module
• Check off CentralAuth when uploading your cohort to get data from all WM projects, especially if you suspect editors worked in other languages
3. Use Wikimetrics to obtain Positive sum only (bytes added) and Negative sum only (bytes removed)"
Using Wikimetrics, you want to use the Bytes metric.
• Start Date is the date and time your event started.
• End Date is the end date and time of your event.
• Time Series by should be any that you wish. Year would give you the simplest number.
• Namespaces should be: "0,1,2,3,4,5,7,8,9,10,11,12,13,14,15"
Note: namespace 6 is missing intentionally - this is the File namespace. Including the number of bytes for images and media would make the number of bytes too large.
• Check both Positive only sum and Negative only sum
• Configure output for Bytes should have aggregate checked off.
• When you create the report, you will have one report. The total number will be totals for the rows "Positive only sum" and "Negative only sum".
4. Calculate Absolute Sum, and report all three numbers
• For the report, enter your numbers into the following equation to calculate total sum, and provide all numbers separately in your report.
${\displaystyle |(PositiveOnlySum)|+|(NegativeOnlySum)|=AbsoluteSum}$
Ex: Positive only sum = 10 bytes, Negative only sum = 20 bytes, so 10+20=30 Total Sum

### General considerations

• Different characters have different sizes in bytes. Additionally, the same content may be expressed in different numbers of characters in different languages. If your contribution is in a language that uses many bytes per character or sentence, your project will show more bytes added than an identical project in another language. Don't stress about this: it's understood and expected. However, it may be useful to note this kind of information in when you report out on your project.
• Different kinds of contribution activities will result in very different bytes added measurements for the same amount of effort. For example, reverting vandalism often involves removing bytes (reverting edits), not adding them. This is also understood and expected: when it comes to bytes, it is known that bigger is not always better.
• When computing total bytes added by your project, it may make sense to calculate bytes added to content namespaces separately from bytes added to talk namespaces. Separating bytes out by namespace allows you to represent the contribution of your project across different kinds of work. For example, if the aim of your project was to create new English Wikipedia articles on Medicine, you should probably calculate your bytes added to the main namespace separately from bytes added to the Wikipedia: or Wikipedia_talk namespaces, because edits to those namespaces likely reflect a different kind of work (discussion, task management) than main namespace edits.

### When to use

• When your project involves creating or contributing content to articles or other written information resources on wiki (such as Help pages, Policy pages, Templates, etc)
• When your project involves making substantial changes to content on wiki pages. For instance, if your project involves re-writing poorly written or incomplete articles.

Note: all projects funded by Wikimedia Foundation grants, regardless of focus, are required to report this metric as part of a global metrics suite, beginning in Fall 2014.