2

I am given a set of $n$ pairs $(x_i, y_i)$, where the $x$-coordinates can be interpreted as the measured values of a random variable $X$ and the $y$-coordinates can be interpreted as some "scaled" probability corresponding to the $x$-value. By plotting these pairs of points in $\mathbb{R}^2$, I get the following points:

enter image description here

It should now be clear what I mean by "scaled" probability: It can't be a probability density since it isn't normed. However, I would like to think of it as a probability and hence I would like to find a fitting distribution. Since it looks normally distributed (and as far as I know, a normal distribution makes sense for the measurement), I computed (weighted) mean and standard deviation and then plotted the corresponding normal distribution. I then (rather arbitrarily) multiplied the density function by a constant to obtain a better-fitting density function for my specific set of data. This scaled function can be seen in the picture above.

From the data, it is also possible to see that the distribution is slightly left-skew. I can compute the skewness with a formula I found on wikipedia and indeed get a negative number.

Question: How can I account for this skewness? What should I change about the density function to get a skew density function still fitting my data?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Huy
  • 151
  • 4
  • 2
    Accounting for skewness would seem to entail subject-matter considerations. As none are sketched here, it is hard to see what you expect. Is the variable bounded above, e.g. by 100, or could it take any value in principle? – Nick Cox Nov 23 '15 at 14:34
  • @NickCox: It is unbounded in principle. – Huy Nov 23 '15 at 14:41
  • 1
    How were the dots and "+" symbols in your plot obtained? – Glen_b Nov 24 '15 at 05:30
  • @Glen_b: The dots are exactly what I get when plotting the pair of data from my measurement. Where do you see "+" symbols? – Huy Nov 25 '15 at 15:00
  • See [here](http://i.imgur.com/oHpbeie.png) – Glen_b Nov 25 '15 at 16:04
  • @Glen_b: I only plotted pairs of points using Mathematica. No idea what those "+" signs are supposed to indicate. EDIT: Maybe all points are plotted as "+" signs and the things that actually look like points are accumulated "+" signs? – Huy Nov 25 '15 at 16:05

1 Answers1

1

You could look into the skew-normal distribution (see wikipedia, estimation for skew normal) and you could use it in the same way you used the normal distribution.

But, lacking any knowledge of how the $(x_i, y_i)$ pairs were obtained, there is no principled statistical way of estimating parameters. It doesn't look like you have IID data! So this is probably more a problem of function approximation, more numerical analysis than statistics (unless you tell us some context).

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    Exactly! There is not enough information on the randomness at play to model this data. – Xi'an Oct 08 '18 at 16:39