16

for the following 3 values 222,1122,45444

WolframAlpha gives 0.706

Excel, using =SKEW(222,1122,45444) gives 1.729

What explains the difference?

David LeBauer
  • 7,060
  • 6
  • 44
  • 89
Scott Weinstein
  • 341
  • 1
  • 7
  • Is this question about empirical or maybe nonparametric skewness or about *estimating* skewness? – gwr Nov 14 '15 at 13:21

1 Answers1

19

They are using different methods to compute the skew. Searching in the help pages for skewness() within the R package e1071 yields:

Joanes and Gill (1998) discuss three methods for estimating skewness:

Type 1:
g_1 = m_3 / m_2^(3/2). This is the typical definition used in many older textbooks.
Type 2:
G_1 = g_1 * sqrt(n(n-1)) / (n-2). Used in SAS and SPSS.
Type 3:
b_1 = m_3 / s^3 = g_1 ((n-1)/n)^(3/2). Used in MINITAB and BMDP.
All three skewness measures are unbiased under normality.

#Why are these numbers different?
> skewness(c(222,1122,45444), type = 2)
[1] 1.729690
> skewness(c(222,1122,45444), type = 1)
[1] 0.7061429

Here's a link to the paper referenced if someone has the credentials to get it for further discussion or education: http://onlinelibrary.wiley.com/doi/10.1111/1467-9884.00122/abstract

Chase
  • 3,055
  • 2
  • 19
  • 28
  • 5
    It's not mathematically possible for "all three skewness measures to be unbiased," because (obviously) their expectations all differ. Perhaps you mean *asymptotically* unbiased? – whuber Apr 03 '11 at 02:10
  • @whuber - I'm going to defer to Friedrich.Leisch@R-project.org who maintains the `e1071` package for clarification on what he meant specifically there. If my post wasn't clear, that comes from the [help page for `skewness()`](http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=e1071:skewness) – Chase Apr 03 '11 at 02:15
  • 3
    @whuber I thought the same so I upvoted your comment then started reading the Joanes & Gill paper (start of section 3, p185) and realised we're wrong : a normal distribution has zero skewness, so any estimator that is a multiple of $g_1$ is unbiased *under normality*. Unfortunately [there's no way to undo a comment upvote](http://meta.stats.stackexchange.com/questions/764/how-to-unvote-on-comments/765#765). – onestop Apr 03 '11 at 06:23
  • 3
    The point is that $g_1 = m_3 / m_2^{3/2}$, where $m_2$ and $m_3$ are the second and third moments about the mean, is the population skewness. As a sample statistic, it then raises similar issues to unbiased estimates of standard deviation, leading to corrections based on $n$ which have some justification but still do not produce unbiased estimates. But I think it is not very helpful to say that these are unbiased estimates of skewness for symmetric distributions; so too is 0, which has a lower variance but is inconsistent and is useless for estimating skewness of asymmetric distributions. – Henry Apr 03 '11 at 12:44
  • @onestop @Henry I agree with you. – whuber Apr 03 '11 at 17:45