Is a distribution still considered right-skewed if the majority of responses are zero?

Question

i have a distribution in which the majority of cases take the value of zero and then there are a few (perhaps 10%) with values of 1,2 or 3. would this distribution still count as right skewed even though it is truncated at zero? and if so could i log transform it to "normalise" the distribution of the variable for analysis? Thanks!

This article talks about the skewness of Poisson distributions for rare events: https://www.umass.edu/wsp/resources/poisson/ — Dan, Aug 01 '18 at 13:35
Sounds like a regular old Poisson distribution. Does it look something like this? https://www.wolframalpha.com/input/?i=poisson+distribution+lambda+%3D+.1 — degenerate hessian, Aug 01 '18 at 14:05
yes. can i take the natural log to transform it if it is truncated like this? — Tom Witten, Aug 01 '18 at 14:26
You will have trouble with ln(0) but as @stephan suggests in his answer you really need to re-think why you need to transform it. — mdewey, Aug 01 '18 at 15:15
Some thoughtful discussions of skewness appear in answers at https://stats.stackexchange.com/questions/212203 and https://stats.stackexchange.com/questions/248479. — whuber, Aug 02 '18 at 12:37

Stephan Kolassa · Accepted Answer · 2018-08-02T11:13:56.597

9

This is certainly possible. The most common definition for a distribution to be right skewed is that the skewness

$$ \gamma_1 := E\bigg(\Big(\frac{X-\mu}{\sigma}\Big)^3\bigg) $$

be positive.

For instance, the Poisson distribution with parameter $\lambda$ has skewness $\frac{1}{\sqrt{\lambda}}>0$, so it is always right skewed. And for sufficiently small $\lambda$, a majority of the mass could be at zero. If $\lambda<\ln 2$, more than half the mass is at zero.

The same can hold for zero inflated distributions.

Regarding your second question, it is usually not necessary to transform data to be "more" normal (although this is a common misconception), especially not discrete data. You may want to ask a separate question on this topic. If you do so, please explain why you believe your data should be transformed to normality.

edited Aug 02 '18 at 11:13

answered Aug 01 '18 at 15:09

Stephan Kolassa

95,027
13
197
357

I think saying that the standardized third moment is *the* definition for skewness is too strong. It's certainly the most common measure of skewness (e.g. more common than second Pearson skewness, Bowley skewness etc etc) but far from the only one. What makes it "the" rather than "a"? – Glen_b Aug 02 '18 at 11:09
I included "most common" before "definition". To be honest, it *is* the most common definition, and if the OP had been interested in a "nonstandard" definition of skewness, I would have expected them to note this. Failing this, it seems reasonable to mostly work off the standard definition. – Stephan Kolassa Aug 02 '18 at 11:15
(I should point out that "most common" was not there originally, and I included it in response to Glen_b's comment, and to Silverfish's comments on [ERT's answer](https://stats.stackexchange.com/a/360197/1352).) – Stephan Kolassa Aug 02 '18 at 11:33

ERT · Answer 2 · 2018-08-02T12:05:33.177

1

A note from this 11th grade stats class states:

For a right skewed distribution, the mean is typically greater than the median. Also notice that the tail of the distribution on the right hand (positive) side is longer than on the left hand side.

As noted in the comments of this post, the above statement is a generalization, and not a definition (though the nonparametric skew does, by definition, require the mean to be greater than the median to establish positive skewness).

edited Aug 02 '18 at 12:05

answered Aug 01 '18 at 13:42

ERT

1,265
3
15

2

(1) The first half of your first quote is a typical *consequence* of skewness, not the definition, so it doesn't really answer the question. In addition, while this is *typically* the case, it is easy to show that it is not *always* the case. The second half, on tail length, is also *often* the case, but not always - skewness does not differentiate between tail length and fatness. So the second half is wrong in the generality it posits. – Stephan Kolassa Aug 01 '18 at 15:12
2

(2) Your second paragraph does not apply, since a [truncated normal](https://en.wikipedia.org/wiki/Truncated_normal_distribution) will still not have any mass on a point, so it will not exhibit any zeros. – Stephan Kolassa Aug 01 '18 at 15:12
1

@StephanKolassa "a typical consequence of skewness, not the definition" - obviously the definition you used in your answer is the more common one these days, but for alternative definitions see e.g. [nonparametric skew](https://en.wikipedia.org/wiki/Nonparametric_skew), the [Pearson skewness coefficients](http://mathworld.wolfram.com/PearsonsSkewnessCoefficients.html), [Groeneveld & Meeden's coefficient](https://en.wikipedia.org/wiki/Skewness#Groeneveld_&_Meeden’s_coefficient) and [Bowley skewness](http://mathworld.wolfram.com/BowleySkewness.html). – Silverfish Aug 01 '18 at 21:26
1

From the definitions of nonparametric skew, Pearson's second coefficient of skewness and Groeneveld & Meeden's coefficient, the mean being greater than the median is indeed a necessary condition of positive skewness. – Silverfish Aug 01 '18 at 21:30
@Silverfish: good points. Yes, there are different definitions of skewness, so in the end it comes down to noting which particular definition we are discussing. I still stand by my comments, but will happily take back my downvote if ERT edits their answer to note which definitions of skewness they refer to. – Stephan Kolassa Aug 02 '18 at 06:58

Is a distribution still considered right-skewed if the majority of responses are zero?

2 Answers2