2

An object is tracked in an experiment I ran. The object's velocities over a 100 time steps are recorded. A model for this object says the velocities should be Rayleigh distributed.

Question 1: For the given data, how can one estimate the parameters of the Rayleigh distribution.

Question 2: After estimating the parameters, how can we calculate the maximum percentage difference between the observed data and estimated model?

The data:

4.1278
3
4.3681
3.4993
3.5099
3.4569
3.7245
3.1744
2.7477
4.1866
3.1257
4.6310
4.4171
3.8297
3.9558
4.4083
4.2123
4.4593
4.3426
3.7140
3.2105
2.5694
2.0113
1.9838
2.7958
2.4362
2.4558
1.0497
0.5789
0.4487
1.4811
1.5465
1.2266
1.628
1.9165
1.3497
2.4598
1.706
2.3716
2.3808
3.9231
2.2745
1.5667
1.7846
2.5792
1.2635
1.7394
1.9063
2.1777
2.8
1.8492
3.7035
2.7328
3.2436
2.7707
2.4044
2.5395
2.7072
2.0354
3.6457
2.8088
2.5456
2.4235
1.6
2.4023
2.9129
4.6793
3.7958
2.8100
1.9567
1.649
1.3688
1.1426
1.2340
0.37162
0.81565
1.5374
1.1495
1.6362
2.0497
1.6581
0.32883
0.4272
0.89617
1.4784
1.2486
1.783
3.52
4.486
3.1934
3.2815
1.2387
1.5428
1.7167
1.4661
1.6412
3.4185
2.3761
3.2665
3.6833

A hack that I tried was to find the mean and standard deviation assuming our data was normal. Knowing scale is not the same thing as standard deviation squared, I used the standard deviation squared as "scale".

Then, I ran the K-S test with two samples: (1) observed data, and (2) the expected values of a Rayleigh distribution with mean and scale (incorrectly as standard deviation) to find the D-max. However, while the D-max is acceptable, the p-values is low. So, I hope that you all can help me find a statistically robust method to find the scale.

Now, it is likely that my data will result in a good fit. However, even if it's just out of academic interest, can we try and solve this problem anyway?

Not critical, but if any of you have an approach to answer the aforementioned questions in Python, that would be of interest to me as well.

P.S.: Just to be sure I wasn't violating any rule, I went through the overview of the site , and it seems like my question is within the rules of the site. However, if there is something wrong, please assist me by making appropriate suggestions.

  • For Question 2, is the QQ-plot a valid approach? – troisquatre Jun 30 '18 at 10:02
  • 3
    1. The question begins in a rather text-book-exercise-like fashion. Is this for a class? An assignment, perhaps? 2. If you want to add to your question, edit your question, but you should explain both what kind of QQ plot you're referring to and how you plan to use it to answer the question. 3. Note that there are many possible ways to estimate parameters. Is there some kind of estimator you have in mind? 4. The Rayleigh distribution probably won't be a very good fit to these data. – Glen_b Jul 01 '18 at 08:09
  • @Glen_b : Thanks for your reply. A2: For data that's not normally distributed, there aren't different kinds of QQ-plots, right ( [ref1](https://data.library.virginia.edu/understanding-q-q-plots/) , [ref2](https://stats.stackexchange.com/questions/101274/how-to-interpret-a-qq-plot) ) ? So, if we want to examine how close our data comes to a Rayleigh distribution we would simply use the [QQ-plots](https://data.library.virginia.edu/understanding-q-q-plots/), correct? – troisquatre Jul 01 '18 at 11:03
  • 1. It wasn't clear that you were plotting against Rayleigh scores; for all I know you could have been plotting against normal scores (we've had questions where people have done things like that), or you might have been doing something else, like a QQ plot of a pair of samples; it's best to make sure exactly what you did and how. 2. Note that the tour doesn't cover what's [on topic](https://stats.stackexchange.com/help/on-topic), for example. 3. Note that the KS test isn't suitable if you have estimated parameters. 4. Would you please address the remaining questions above? – Glen_b Jul 01 '18 at 11:40
  • @Glen_b , thank you for your input about the KS test and the site. I should have addressed everything you requested for, but if something remains, please let me know. To clarify: the data comes from an experiment I conducted where I track a moving object. – troisquatre Jul 01 '18 at 12:13
  • 1 Thanks; I also asked if you had a particular kind of estimator in mind (there are many reasonable estimators; though I'd probably go with MLE unless there's likely to be some contamination). 2. To further clarify on the QQ plots thing -- your links were mostly to normal QQ plots, but that's not suitable for assessing whether you have Rayleigh data. You'd want a QQ plot specifically designed for data from a Rayleigh distribution. (The answer shows one way to do this though) – Glen_b Jul 01 '18 at 13:18
  • @Glen_b, I wasn't able to confidently choose an appropriate estimator for a $\chi_k$ distribution. For instance, why is MLE the best option? Please note, I added a "meta style" comment, feel free to delete it if needed. – troisquatre Jul 01 '18 at 13:36
  • A Rayleigh distribution looks like a poor fit--there are too many small values in it. Perhaps a better model would be that the observations are made with a certain amount of error, whence their distribution would reflect a sum of a Rayleigh variable and the error. – whuber Jul 02 '18 at 14:08

1 Answers1

4

You can use maximum likelihood estimation to estimate the scale parameter $\sigma$ of the Rayleigh distribution.

The maximum likelihood estimator of $\sigma$ is: $$ \hat{\sigma}=\sqrt{\frac{1}{2n}\sum_{i=1}^{n}x_{i}^2} $$

This estimator is biased, however (see Siddiqui 1964). An unbiased estimator is given by:

$$ \hat{\sigma}=\sqrt{\frac{1}{2n}\sum_{i=1}^{n}x_{i}^2}\cdot\frac{\Gamma(n)\sqrt{n}}{\Gamma(n + 1/2)} $$

where $\Gamma$ denotes the gamma function.

Here is how to estimate the scale parameter in R using maximum likelihood:

library(fitdistrplus)
library(extraDistr) # implements the Rayleigh distribution

# The data

x <- c(4.1278, 3, 4.3681, 3.4994, 3.5099, 3.4568, 3.7245, 3.1743, 
2.7477, 4.1865, 3.1257, 4.6311, 4.4171, 3.8296, 3.9558, 4.4084, 
4.2123, 4.4592, 4.3426, 3.7141, 3.2105, 2.5693, 2.0113, 1.9839, 
2.7958, 2.4361, 2.4558, 1.0498, 0.57897, 0.4486, 1.4811, 1.5466, 
1.2266, 1.627, 1.9165, 1.3498, 2.4598, 1.706, 2.3715, 2.3808, 
3.9232, 2.2745, 1.5666, 1.7846, 2.5793, 1.2635, 1.7393, 1.9063, 
2.1778, 2.8, 1.8491, 3.7035, 2.7329, 3.2436, 2.7708, 2.4044, 
2.5396, 2.7072, 2.0353, 3.6457, 2.8089, 2.5456, 2.4234, 1.6, 
2.4024, 2.9129, 4.6792, 3.7958, 2.8101, 1.9567, 1.649, 1.3687, 
1.1426, 1.2341, 0.37162, 0.81565, 1.5374, 1.1496, 1.6362, 2.0498, 
1.6581, 0.32883, 0.4273, 0.89617, 1.4783, 1.2486, 1.783, 3.52, 
4.486, 3.1935, 3.2815, 1.2388, 1.5428, 1.7168, 1.4661, 1.6411, 
3.4185, 2.3762, 3.2665, 3.6832)

fit <- fitdist(x, "rayleigh", start = list(sigma = 1))

summary(fit) # print estimated parameters

Fitting of the distribution ' rayleigh ' by maximum likelihood 
Parameters : 
      estimate Std. Error
sigma 1.919764 0.09598814
Loglikelihood:  -152.7544   AIC:  307.5087   BIC:  310.1139

plot(fit) # inspect fit

Rayleighfit

The estimated scale parameter is $\hat{\sigma}=1.918$. Inspecting the Q-Q plot of the fit, I'd say that the Rayleigh distribution is probably a suboptimal fit to these data (there are deviations at the upper and lower end of the Q-Q plot).

The unbiased maximum likelihood estimator is 1.922.

Siddiqui, M. M. (1964) "Statistical inference for Rayleigh distributions", The Journal of Research of the National Bureau of Standards, Sec. D: Radio Science, Vol. 68D, No. 9, p. 1007

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
  • @COOLSerdash thanks a ton for your help. Providing a reference is particularly useful. I need time to digest and understand your proposed approach, after which I will appropriately vote and provide comments if necessary. – troisquatre Jul 01 '18 at 12:53
  • Feel free to delete if this comment ought to be in the Meta site. I wasn't able to find resources to my problem that I could understand. Even the provided Siddiqui reference is not easy to understand as I don't have a background in stats. There are plenty of straightforward resources for normally distributed data, but I wasn't able to find much for Rayleigh. Further, my "hack" approach was clearly wrong. I believe this is a question that could benefit not just me. Lastly, framing a question such that it clearly expresses the objectives and satisfies the not-homework criteria, isn't trivial. – troisquatre Jul 01 '18 at 13:21
  • 1
    @COOLSerdash you have both $N$ and $n$ in your formula for the unbiased version of the estimator. You probably want to make them all $n$'s. – Glen_b Jul 01 '18 at 13:26