Questions tagged [reliability]

A measure is said to have a high reliability if it produces similar results under consistent conditions. DO NOT confuse reliability with validity (see tag wiki). DO NOT use for inter-rater reliability which has its own tag inter-rater

Reliability refers to the overall consistency of a measurement, rather than its accuracy (how well it reflects some real, external quantity - i.e. validity).

Note that the meaning of this term varies slightly from one discipline to another. It may either refer to reliability in engineering, the bias variance trade-off illustrated below when characterizing the properties of an estimator or a statistical predictive model as a whole, or the consistency of a subjective or biological measure, both in time and across raters. In the latter case, related tags are , , , .

Reference: Wikipedia

Reliability vs Validity illustration
(Figure adapted from this image, created by Nevit Dilmen
under the Creative Commons Attribution-Share Alike 3.0 license.)

482 questions
45
votes
1 answer

Computing Cohen's Kappa variance (and standard errors)

The Kappa ($\kappa$) statistic was introduced in 1960 by Cohen [1] to measure agreement between two raters. Its variance, however, had been a source of contradictions for quite a some time. My question is about which is the best variance…
Cesar
  • 984
  • 1
  • 9
  • 21
34
votes
3 answers

What distribution does my data follow?

Let us say that I have 1000 components and I have been collecting data on how many times these log a failure and each time they logged a failure, I am also keeping track of how long it took my team to fix the problem. In short, I have been recording…
29
votes
2 answers

Inter-rater reliability for ordinal or interval data

Which inter-rater reliability methods are most appropriate for ordinal or interval data? I believe that "Joint probability of agreement" or "Kappa" are designed for nominal data. Whilst "Pearson" and "Spearman" can be used, they are mainly used for…
shadi
  • 497
  • 1
  • 4
  • 10
29
votes
1 answer

Computing repeatability of effects from an lmer model

I just came across this paper, which describes how to compute the repeatability (a.k.a. reliability, a.k.a. intraclass correlation) of a measurement via mixed effects modelling. The R code would be: #fit the model fit =…
25
votes
2 answers

Is Joel Spolsky's "Hunting of the Snark" post valid statistical content analysis?

If you've been reading the community bulletins lately, you've likely seen The Hunting of the Snark, a post on the official StackExchange blog by Joel Spolsky, the CEO of the StackExchange network. He discusses a statistical analysis conducted on a…
Christopher
  • 353
  • 2
  • 6
18
votes
2 answers

Accuracy vs. area under the ROC curve

I constructed an ROC curve for a diagnostic system. The area under the curve was then non-parametrically estimated to be AUC = 0.89. When I tried to calculate the accuracy at the optimum threshold setting (the point closest to point (0, 1)), I got…
Ali Sultan
  • 563
  • 1
  • 5
  • 14
17
votes
2 answers

Assessing reliability of a questionnaire: dimensionality, problematic items, and whether to use alpha, lambda6 or some other index?

I am analyzing scores given by participants attending an experiment. I want to estimate the reliability of my questionnaire which is composed of 6 items aimed at estimating the attitude of the participants towards a product. I computed Cronbach's…
giovanna
  • 509
  • 1
  • 7
  • 13
14
votes
3 answers

Where do the descriptors for Cronbach's alpha values come from (e.g., poor, excellent)?

It seems fairly common to describe Cronbach's alpha values as follows: α ≥ 0.9 Excellent 0.7 ≤ α < 0.9 Good 0.6 ≤ α < 0.7 Acceptable 0.5 ≤ α < 0.6 Poor α < 0.5 Unacceptable Where do these values come from? I cannot find an original research…
Behacad
  • 4,916
  • 8
  • 30
  • 48
14
votes
4 answers

What are the case studies in public health policy research where unreliable/confounded/invalid studies or models were misused?

I am drafting a literature review on a current public health issue where data are confounded: What are common historical case-studies that are used in public health/epidemiology education where invalid or confounded relationships or inferences were…
cerd
  • 243
  • 3
  • 9
13
votes
1 answer

How to measure the reliability of a consensus ranking (problem from Kemeny-Snell book)

Suppose that $k$ experts are each asked to rank a set of $n$ objects in order or preference. Let allow ties in the rankings. John Kemeny and Laurie Snell in their 1962 year book "Mathematical models in the Social Sciences" propose to solve next…
aeiklmkv
  • 131
  • 6
13
votes
2 answers

Interrater reliability for events in a time series with uncertainty about event time

I have multiple independent coders who are trying to identify events in a time series -- in this case, watching video of face-to-face conversation and looking for particular nonverbal behaviors (e.g., head nods) and coding the time and category of…
dschulman
13
votes
2 answers

Quadratic weighted kappa

I have done a little Googling about quadratic weighted kappa, but I couldn't find a good explanation that make me understood that. Can somebody give some resource or brief explanation?
12
votes
2 answers

Identification of useless questions from a questionnaire

I'm developing a questionnaire. To improve its reliability and validity I want to use statistical methods. I want to eliminate questions whose answers are always the same. This means that nearly all participants gave the same answers on those…
Max
  • 121
  • 4
12
votes
2 answers

How accurate is IQR for detecting outliers

I'm writing a script that analyses run times of processes. I am not sure of their distribution but I want to know if a process runs "too long". So far I've been using 3 standard deviations of the last run times (n>30), but I was told that this…
chris bedd
  • 121
  • 1
  • 1
  • 3
12
votes
2 answers

How to reduce number of items using factor analysis, internal consistency, and item response theory in conjunction?

I am in the process of empirically developing a questionnaire and I will be using arbitrary numbers in this example to illustrate. For context, I am developing a psychological questionnaire aimed at assessing thought patterns commonly identified in…
1
2 3
32 33