How to find ks test statistic using the given maximum likelihood estimator values and a sorted data in R?

Question

The random variable Y is said to have a two-parameter APE distribution denoted by APE(α, λ), with the shape and scale parameters as α > 0 and λ > 0, respectively, if the PDF of Y for y > 0 is

f(y; α, λ) =  (log α /α−1 )λe^(−λy)α^(1−e^(−λy)) ,if α = 1

           =  λe^(−λy) ,if α = 1

           =  0,otherwise

The CDF of Y for y > 0 becomes

F(y; α, λ) = [α^(1−e^(−λy))−1] /(α−1), if α = 1

           =  1 − e^(−λy),if α = 1

y=1 4 4 7 11 13 15 15 17 18 19 19 20 20 22 23 28 29 31 32 36 37 47 48 49 50 54 54 55 59 59 61 61
66 72 72 75 78 78 81 93 96 99 108 113 114 120 120 120 123 124 129 131 137 145 151 156 171
176 182 188 189 195 203 208 215 217 217 217 224 228 233 255 271 275 275 275 286 291 312
312 312 315 326 326 329 330 336 338 345 348 354 361 364 369 378 390 457 467 498 517 566
644 745 871 1312 1357 1613 1630.

n=109

alpha estimated =0.00366583  
lambda estimated =0.0009550325

When I tried coding this using R, I first found the cdf; #CDF OF APE

cdf <- function(alpha,lambda){

  if(alpha!=1){

    apecdf<-((alpha^(1-exp(-lambda*y)))-1)/ (alpha-1)}else if(alpha==1){

     apecdf<- 1-(exp(-lambda*y))}

  return(apecdf)

}
#k-s test statistic and p-Value for APE

t <- ks.test(y,cdf(0.00366583,0.0009550325),shape=0.00366583,scale=0.0009550325)

t

RESULT:

    Two-sample Kolmogorov-Smirnov test

data:  y and cdf(0.00366583, 0.0009550325)

D = 1, p-value < 2.2e-16  
alternative hypothesis: two-sided

What am I doing wrong here?

I need to get the ks-statistic= 0.0742, p-value=0.5852 (somewhat near these)

score 1 · Accepted Answer · answered Jul 15 '20 at 12:18

1

In my opinion you have to perform a one-sample Kolmogorov-Smirnov Test. You have a sample of the random variable Y and you want to check if the random variable is two-parameter APE distributed. So I've updated your cdf such that it has a further argument for the values of the sample, called argument y.

However, you have some problems performing the KS-Test. First problem is that this random sample y has ties (meaning repeated values). Since the KS-Test is designed for continuous distributions your y should not contain repeated values. Second problem is that you are using estimates that are computed on the basis of y.

In the attached reprex I've perfomed a one-sample KS-Test. But I can not reproduce the desired test statistic D and p-value. I've inserted some comments.

cdf <- function(y, alpha,lambda){ 
  
  if(alpha!=1){
    
    apecdf<-((alpha^(1-exp(-lambda*y)))-1)/ (alpha-1)}else if(alpha==1){
      
      apecdf<- 1-(exp(-lambda*y))}
  
  return(apecdf)
  
}

# given: 
y <- c(1, 4, 4, 7, 11, 13, 15, 15, 17, 18, 19, 19, 20, 20, 22, 23, 28, 29, 31, 32, 
       36, 37, 47, 48, 49, 50, 54, 54, 55, 59, 59, 61, 61, 66, 72, 72, 75, 78, 78, 
       81, 93, 96, 99, 108, 113, 114, 120, 120, 120, 123, 124, 129, 131, 137, 145, 
       151, 156, 171, 176, 182, 188, 189, 195, 203, 208, 215, 217, 217, 217, 224, 228, 
       233, 255, 271, 275, 275, 275, 286, 291, 312, 312, 312, 315, 326, 326, 329, 330, 
       336, 338, 345, 348, 354, 361, 364, 369, 378, 390, 457, 467, 498, 517, 566, 644, 
       745, 871, 1312, 1357, 1613, 1630)
alpha <- 0.00366583
lambda <- 0.0009550325

# Case 1: Perform a one-sample KS-Test: 
t1 <- ks.test(x = y,y = "cdf", alpha, lambda)
#> Warning in ks.test(x = y, y = "cdf", alpha, lambda): ties should not be present
#> for the Kolmogorov-Smirnov test
t1
#> 
#>  One-sample Kolmogorov-Smirnov test
#> 
#> data:  y
#> D = 0.061673, p-value = 0.8014
#> alternative hypothesis: two-sided

## Problems: 
### 1) KS-Test is for continuous distributions and hence your y should not contain the repeated values (ties)!
### 2) Parameters should not be estimated from data 
###    (specified in ?ks.test... "If a single-sample test is used, the parameters 
###    specified in ... must be pre-specified and not estimated from the data. 
###    There is some more refined distribution theory for the KS test with estimated 
###    parameters (see Durbin, 1973), but that is not implemented in ks.test.")

# Case 2: Perform a one-sample KS-Test adding some variation in y: 
y_var <- y + rnorm(length(y), sd = 0.005) #  because of the ties problem! 
colMeans(cbind(y, y_var)) 
#>        y    y_var 
#> 233.3211 233.3211
apply(cbind(y, y_var), 2, sd)   
#>        y    y_var 
#> 296.4344 296.4344
# pretty similar!

t2 <- ks.test(x = y_var, y = "cdf", alpha, lambda)
t2
#> 
#>  One-sample Kolmogorov-Smirnov test
#> 
#> data:  y_var
#> D = 0.061669, p-value = 0.8015
#> alternative hypothesis: two-sided

## Problem: 
### Second Problem of Case 1 from line 30 - 34!

^{Created on 2020-07-15 by the reprex package (v0.3.0)}

Hope it helps!

answered Jul 15 '20 at 12:18

Tim-TU

351
2
8

OH MY GOD! That's so awesome.Thank you, I've been trying to get it right since such a long time. Hey, can you please answer my other question as well? Will really appreciate it! – Felix Jul 15 '20 at 12:38
Unfortunately, this answer is wrong: you need the Lilliefors variant of the KS test in order to compute the p-value correctly. This is because `ks.test` expressly assumes the reference distribution you supply was determined independently of the data rather than estimated from the data. The estimation process makes it (far) more likely that the data will look close to the distribution. – whuber Jul 15 '20 at 12:40
So why is it wrong? I've stated out that two problems occur when performing the KS-Test with the given data and the parameters estimated on the basis of this sample. – Tim-TU Jul 15 '20 at 12:44
@whuber Why aren't you helping us out then, if you know better? – Felix Jul 15 '20 at 12:47
I have, matrika: you now have suitable keywords for searching our site for the answer, as well as a caution against committing the mistakes in this answer. A moderator's job is not to answer every question that comes along on this site; we have to rely on you to do some of your own research. ;-) – whuber Jul 15 '20 at 12:53
1

@whuber Oooooo!Thanks for your help. – Felix Jul 15 '20 at 13:54
So first, I really don't get what is meant in "mistakes in the answer". The answer pointed out that one can not use the parametes estimated by this sample. The part of the answer were I said that one should do a one sample test was due to the question, how one would check if the data follow the given theoretical distribution. Would be really nice if you could read the full answer first and maybe ask for clarification if something is misunderstood. Furthermore I've shown how matrika could use the function `ks.test` to use a self written cdf for the purpose of a one sample test. – Tim-TU Jul 15 '20 at 16:46
Actually the paper i'm working on, gives a diferent set of estimates, using which your method for finding k-s test statistic and p-value is apt. No worries, @whuber must be talking about some other things beyond the help i need. – Felix Jul 15 '20 at 17:18
1

I see you have buried some statements about "problems" in comments in the code. Many readers will overlook that or not fully understand that those statements mean your code does not solve the problem. The second part of the code involving "adding some variation" is so *ad hoc* that the burden is on you to demonstrate its correctness. – whuber Jul 15 '20 at 18:01
1

It is somewhat ad hoc to say that the explanations are buried in the code, since this was pointed out in the reply. The second section of the code gives further insight into how the functoin `ks.test` works by demonstrating why and how the warning "ties should not be present for the Kolmogorov-Smirnov test" was returned. However, – Tim-TU Jul 16 '20 at 04:16
2

Anyway, I think there are two ways to proceed now. First, the question is edited in such a way that it is clear to everyone that a different parameter set is used and this set has nothing to do with the given sample and was not estimated on the basis of the given sample(matrika mentioned this in the comments). Then my answer can be accepted as it is. But I would change "problems" to "concerns" in my answer. Second, the question is edited so that the given parameters are the maximum likelihood estimates of the sample and then my answer is completely wrong and I will delete it immediately. – Tim-TU Jul 16 '20 at 04:23
2

That's reasonable. A third way would be to supply an effective solution, as described at https://stats.stackexchange.com/questions/237779 and https://stats.stackexchange.com/a/2660/919. – whuber Jul 16 '20 at 13:53

How to find ks test statistic using the given maximum likelihood estimator values and a sorted data in R?

1 Answers1