Questions tagged [robust]

Robustness in general refers to a statistic's insensitivity to deviations from its underlying assumptions (Huber and Ronchetti, 2009).

Robust statistics are insensitive to deviations from their underlying assumptions and outliers. Such methods are useful it is not possible to detect and remove outliers or to appropriately test the assumptions required by a given statistic. A robust statistic is meant to achieve three goals:

  1. efficiency - it should have an optimal or nearly optimal efficiency as the assumed model
  2. stability - small deviations from the assumptions should have only a small influence on performance
  3. breakdown - larger deviations from the assumptions should not lead to a complete failure

Examples of robust statistics are median regression as estimation technique, or Huber-White standard errors for statistical inference. Note that "robust" is not equivalent to "better". Robustness is always based on compromise as it sacrifices efficiency to ensure against larger deviations from the assumptions from the model (Anscombe, 1960).

For further reading see

  • Huber, P.J. and Ronchetti, E.M. (2009) "Robust Statistics", 2nd Edition, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., New Jersey
  • Anscombe, F.J. (1960) "Rejection of Outliers", Technometrics, Vol. 2, pp. 123-147
519 questions
85
votes
14 answers

Why haven't robust (and resistant) statistics replaced classical techniques?

When solving business problems using data, it's common that at least one key assumption that under-pins classical statistics is invalid. Most of the time, no one bothers to check those assumptions so you never actually know. For instance, that so…
doug
  • 9,901
  • 1
  • 22
  • 26
54
votes
4 answers

Replicating Stata's "robust" option in R

I have been trying to replicate the results of the Stata option robust in R. I have used the rlm command form the MASS package and also the command lmrob from the package "robustbase". In both cases the results are quite different from the "robust"…
user56579
  • 541
  • 1
  • 5
  • 4
53
votes
4 answers

Fast linear regression robust to outliers

I am dealing with linear data with outliers, some of which are at more the 5 standard deviations away from the estimated regression line. I'm looking for a linear regression technique that reduces the influence of these points. So far what I did is…
Matteo Fasiolo
  • 3,134
  • 2
  • 20
  • 29
53
votes
3 answers

Why do we care so much about normally distributed error terms (and homoskedasticity) in linear regression when we don't have to?

I suppose I get frustrated every time I hear someone say that non-normality of residuals and /or heteroskedasticity violates OLS assumptions. To estimate parameters in an OLS model neither of these assumptions are necessary by the Gauss-Markov…
39
votes
2 answers

Why should we use t errors instead of normal errors?

In this blog post by Andrew Gelman, there is the following passage: The Bayesian models of 50 years ago seem hopelessly simple (except, of course, for simple problems), and I expect the Bayesian models of today will seem hopelessly simple, 50…
Potato
  • 1,025
  • 1
  • 11
  • 12
37
votes
2 answers

Error "system is computationally singular" when running a glm

I'm using the robustbase package to run a glm estimation. However when I do it, I get the following error: Error in solve.default(crossprod(X, DiagB * X)/nobs, EEq) : system is computationally singular: reciprocal condition number =…
NK1
  • 543
  • 1
  • 5
  • 6
34
votes
4 answers

Why isn't RANSAC most widely used in statistics?

Coming from the field of computer vision, I've often used the RANSAC (Random Sample Consensus) method for fitting models to data with lots of outliers. However, I've never seen it used by statisticians, and I've always been under the impression…
Bossykena
  • 667
  • 6
  • 11
33
votes
6 answers

What would a robust Bayesian model for estimating the scale of a roughly normal distribution be?

There exists a number of robust estimators of scale. A notable example is the median absolute deviation which relates to the standard deviation as $\sigma = \mathrm{MAD}\cdot1.4826$. In a Bayesian framework there exist a number of ways to robustly…
Rasmus Bååth
  • 6,422
  • 34
  • 57
33
votes
8 answers

Replacing outliers with mean

This question was asked by my friend who is not internet savvy. I've no statistics background and I've been searching around internet for this question. The question is : is it possible to replace outliers with mean value? if it's possible, is…
Alun
  • 433
  • 1
  • 4
  • 5
31
votes
2 answers

Are 50% confidence intervals more robustly estimated than 95% confidence intervals?

My question flows out of this comment on an Andrew Gelman's blog post in which he advocates the use of 50% confidence intervals instead of 95% confidence intervals, although not on the grounds that they are more robustly estimated: I prefer 50% to…
27
votes
1 answer

What are the multidimensional versions of median

What are the multidimensional versions of the median and what are their pros and cons? I confess this doesn't have a single answer, but I think it is a useful question to ask and will be a benefit to others as well. How stable it is (i.e. how many…
John Robertson
  • 973
  • 3
  • 15
  • 25
25
votes
5 answers

How robust is the independent samples t-test when the distributions of the samples are non-normal?

I've read that the t-test is "reasonably robust" when the distributions of the samples depart from normality. Of course, it's the sampling distribution of the differences that are important. I have data for two groups. One of the groups is highly…
Archaeopteryx
  • 545
  • 2
  • 7
  • 9
21
votes
2 answers

Is a weighted $R^2$ in robust linear model meaningful for goodness of fit analysis?

I estimated a robust linear model in R with MM weights using the rlm() in the MASS package. `R`` does not provide an $R^2$ value for the model, but I would like to have one if it is a meaningful quantity. I am also interested to know if there is any…
CraigMilligan
  • 571
  • 1
  • 4
  • 9
20
votes
4 answers

Mean and Median properties

Can somebody explain me clear the mathematical logic that would link two statements (a) and (b) together? Let us have a set of values (some distribution). Now, a) Median does not depend on every value [it just depends on one or two middle…
ttnphns
  • 51,648
  • 40
  • 253
  • 462
20
votes
3 answers

Crash course in robust mean estimation

I have a bunch (around 1000) of estimates and they are all supposed to be estimates of long-run elasticity. A little more than half of these is estimated using method A and the rest using a method B. Somewhere I read something like "I think method B…
Ondrej
  • 547
  • 1
  • 5
  • 11
1
2 3
34 35