Learning and Evaluation/Archive/Learning modules/3Reliability

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

  Wikimedia Training Designing Effective Questions Menu


Reliability is how consistently a proxy (i.e. survey) measures a construct. Having a basic understanding of the types of reliability is helpful when writing a survey, but generally, testing for reliability involves some statistics. Deeper statistical information at the bottom is provided for the interested.

Internal Reliability
This is the most commonly used method to test reliability. Internal reliability involves testing for homogeneity, that is, testing whether different questions that aim to measure similar targets are correlated, as opposed to being the result of random chance.


Test-retest reliability
Does the measure produce the same or similar results from the same respondents if administered at different points of time? Usually the questionnaire is administered on 2 occasions separated by a few days. Ideally, responses shouldn't vary except in measures of health, which can change from day to day.

Statistical background
Internal reliability is measured using the Cronbach's alpha statistic (for items with more than 2 response categories) and the Kuder-Richardson (KR-20) test (for items with 2 response categories, e.g. yes/no)If the Alpha statistic is < 0.5, then this is regarded as low internal reliability (i.e. the items are not measuring the same phenomenon).
Test-retest reliability is measured using a basic correlation coefficient targeting a test-retest correspondence that is one-to-one, or, it may be more carefully assessed using Cohen's Kappa statistic, as is used for measuring inter-rater reliability, or the reliability between two rating instances, taking into account the random chance of agreement as well as observed inter-rater agreement.