Jump to content

Global Reach/Brazil Survey Documentation

From Meta, a Wikimedia project coordination wiki

Overview[edit]

This documentation is a supplement to our Brazil Phone Survey results. It provides an overview of the context and methodologies through which the phone survey data were gathered and organized. To enable those interested in further investigation, we also provide some recommendations on how to use the raw data to optimize meaningful analysis and exploration.

Brazil phone survey[edit]

There are a total of 16 questions in the survey, addressing the following categories:

  • Internet use
  • Mobile phone use (smartphones & basic voice/SMS phones)
  • Awareness and use of Wikipedia
  • General demographics


Phone surveys were conducted in July 2016 by Votomobile. We varied the number of responses collected per region to approximately reflect the population of that region in comparison to the total population of Brazil.

Here are the main questions this survey was designed to answer. However, analyzing the full data set allows you to conduct more in-depth data explorations and gain further insights around these questions:

  • What is the actual number of people who use the internet?
(Real-world behavior makes this difficult to measure from industry reports, since people might have access to the internet through schools, friends, internet cafés, public Wifi, etc.)
  • For internet users: What do people mostly use the internet for?
  • For non-internet users: Why not use the internet?
  • How many people use smartphones?
  • Do people with smartphones use the internet from just Wifi? Or just cellular service?
  • How many people think that they don’t use the internet, but still use Facebook or WhatsApp?
  • How many people have heard of Wikipedia? What do they use it for? How often?
  • If they have heard of Wikipedia, but aren’t using it, why not?

Selection of the 9 regions of Brazil[edit]

  • Regions chosen were based on the ‘area codes’ of Brazil’s mobile phone system.
  • Each regions response target was determined by its population relative to the population of Brazil.

Table of Regions[edit]

Calling Groups States Included Area code(s)
A State of São Paulo 11-19
B States of Rio de Janeiro, Espírito Santo 21, 22, 24, 27, 28
C State of Minas Gerais 31-38
D States of Paraná, Santa Catarina 41-47, 49
E State of Rio Grande do Sul 51, 53-55
F Central-West Region and states of Tocantins, Acre and Rondônia 61-69
G States of Bahia and Sergipe 71, 73-75, 79
H Northeast Region 81-89
I North Region and the state of Maranhão 91-99

Where to get the data[edit]

Flow diagram of survey questions
  • The full data set can be found at:
Dan Foy (2016). Brazil phone survey 2016. figshare. doi:10.6084/m9.figshare.5404834
This is the canonical version which contains a CSV including every answer from each of the 5343 responses.
  • The full text of the questions can be found here.

Using the data effectively for analysis[edit]

Looking at Brazil as a whole[edit]

For an overview of the Brazilian population, you should turn on the “Brazil Representation Subset” filter to obtain a subset of 2500 responses, with each regional survey size contribution determined by its population percentage.

Looking at regional subsets[edit]

Studying the data set from a regional level should provide additional insights. You must ensure that “Brazil Representation Subset” filter is turned off before filtering out the region of interest.

Important to note: The regional and country representation filters should not be used in combination, because together they can reduce the available regional data significantly.

Impact of combining regional and country filtering:

  • For instance, let us focus on Calling Group A (State of São Paulo)
  • When only the regional filter is on, 713 full responses are available for analysis.
  • When both the regional filter and “Brazil Representation subset” (country proportionality) filter are on, only 549 of the 713 full responses are available, which might cause analysis to be less statistically significant.

Individual survey responses[edit]

Within the CSV file, each row represents one survey taken, with each column containing the response to the associated question. In certain cases, some questions that should have been asked were not, and these entries were marked as “Missing’'.

  • When analyzing results from questions Q9A-9D, Q12, and Q13-13A, please set the filter to “Full Responses”.
  • When analyzing results from any other questions, you can include non-full responses to increase the sample size with fully valid data.
  • When “Brazil Representation Subset” is on, all responses are automatically set to be full responses and no special treatment is needed.

Facebook / WhatsApp questions[edit]

The questions asking if the respondents use Facebook or WhatsApp are only asked if they previously said that they do not use the internet. This is by design - we wanted to use this question to gauge how many people did not understand that Facebook was part of the internet. The responses to these two questions were not intended to measure the full use of Facebook or WhatsApp.

Non-linear progression & Margin of Error[edit]

It is important to note that this survey is non-linear. Depending on how a question is answered, the flow of the rest of the survey may change. For example, if a respondent says that he or she does not have a smartphone, we skip the smartphone-related questions. You can review the flow diagram to see how the survey progresses. For proper statistical validity, our survey size is large enough where the questions asked of all respondents have a 95% degree of certainty of being accurate within a 2% margin of error.

Methodologies[edit]

Addressing Biases[edit]

One issue with phone surveys is the tendency for some respondents to favor the first response to a question. To address this problem, most of the survey questions presented the responses in a random order for each call. This distributes any bias evenly among the responses instead of accumulating it all on one response. Note that questions that have a 'none of these' or 'other' response always kept this option as the last one presented.

A couple of survey questions, however, have a strong order dependency of their responses and are confusing if they are presented in a completely random order. For instance, when we ask how often they use Wikipedia, asking in a non-sequential order would not make sense (e.g. an order of “once a week”, “once a month”, “once a day”). For these questions, we would randomly present the question in one of two orders: either from lowest to highest, or highest to lowest.

Calculation of Proportionality[edit]

To achieve a full Brazil representation, we introduced proportionality to determine the number of responses we select per region for analysis:

  • We determined the actual regional population of Brazil referencing “List of Brazilian states by population density”.
  • We summed up all the actual population of each region represented in the survey to 202,722,105.
  • We calculated the % of total population each area code represented in the survey. For instance, calling group A (State of São Paulo) had a total population of 44,035,304, which constituted to about 22% of the total population.
  • We proportionalized sample size to 2500 and calculated the number of responses per region to take into consideration a full Brazil representation.
  • We ordered raw data chronologically and filtered out complete responses based on calculated proportionality. We added a column “ Brazil Representation Subset” and indicated selected response as “TRUE”. To obtain data for a full Brazil representation, simply select “TRUE”.
Calling Group Region Population % of Total Population Proportionalized Response Size (2500 responses)
Group A State of São Paulo 44,035,304 21.72% 550
Group B States of Rio de Janeiro and Espírito Santo
Rio de Janeiro 16,461,173 8.12%
Espírito Santo 3,885,049 1.92% 250
Group C State of Minas Gerais 20,734,097 10.23% 250
Group D States of Paraná and Santa Catarina
Paraná 11,081,692 5.47%
Santa Catarina 6,727,148 3.32% 200
Group E State of Rio Grande do Sul 11,207,274 5.53% 150
Group F Central-West Region and states of Tocantins, Acre and Rondônia
Federal District 2,852,372 1.41%
Goiás 6,523,222 3.22%
Tocantins 1,496,880 0.74%
Mato Grosso 3,224,357 1.59%
Mato Grosso do Sul 2,619,657 1.29%
Acre 790,101 0.39%
Rondônia 1,748,531 0.86% 225
Group G States of Bahia and Sergipe
Bahia 15,126,371 7.46%
Sergipe 2,219,574 1.09% 200
Group H Northeast Region
Pernambuco 9,277,727 4.58%
Alagoas 3,321,730 1.64%
Paraíba 3,943,885 1.95%
Rio Grande do Norte 3,408,510 1.68%
Ceará 8,842,791 4.36%
Piauí 3,194,718 1.58% 400
Group I North Region and the state of Maranhão
Pará 8,073,924 3.98%
Amazonas 3,873,743 1.91%
Roraima 450,479 0.22%
Amapá 750,912 0.37%
Maranhão 6,850,884 3.38% 275
202,722,105 100.00% 2500

Skipped questions / Full responses[edit]

Votomobile experienced a logic flow problem with some of the responses, which led to a small set of questions being occasionally skipped (only possible with Q9A-Q9D, Q12 and Q13-13A). When one of those questions was incorrectly skipped, that particular response in the spreadsheet is set to ‘Missing’, and the entry in the ‘Full response’ column is set to FALSE for filtering purposes.

To address this issue, Votomobile conducted extra full surveys to make up for the incomplete responses. In the current spreadsheet, both the original (with ‘Missing’ marked where needed) and the additional responses are combined together for analysis. For our initial analysis of the data set, we only used responses marked as “Full Response” for our results.


External links[edit]