1

I am trying to obtain the probability mass function for various order statistics of a log series distribution for a given $n$. To do so, I tried modifying the code given in this question: Simulating order statistics (as well as using the dbinom function).

My modified code is given below

f_order_statistic <- function(p,n,k,x) {
  Output <- lfactorial(n) - lfactorial(k-1) - 
            lfactorial(n-k) + (k-1)*extraDistr::plgser(x,p, log =TRUE) + 
            (n-k)*extraDistr::plgser(x,p, log=TRUE, lower.tail=FALSE) + 
            extraDistr::dlgser(x,p, log=TRUE) 
  return(exp(Output))
}

However, I am not sure if my answers are correct. For example, for $n = 100$ (sample size) and $k = 100$th (order statistic), I believe the pmf should be concentrated around $x = 2$.

I am getting the following values f(1) = 0.042, f(2) = 0.289 and f(3) onwards are very small (lower than 10 raised to -5). Essentially it is not adding up to 1. Am I making a mistake in my code somewhere or is my understanding that the summation of f(x) (from 1 to infinity) should add up to 1 incorrect.

To clarify the points raised by wolfies - I am using R to generate the code.

The problem I am trying to solve with my code is that I want to find the distribution of a particular order statistic (eg: kth order statistic) for a sample of n iid observations drawn from a population that has a logarithmic series distribution. The code seeks to generate the pmf for the order statistic for a given value of x. Therefore, if I I generate it for all values of x, it should add up to 1. The code seeks to implement the standard order statistic formula of $$ f(x) = n! / ((k - 1)! * (n - k)!) * f(x) * (F(x))^(k - 1) * (1 - F(x))^(n - k) $$ For example I believe that for $n = 200, k = 100$, the cumulative function (obtained by summing up the pmfs for different values of x) to reach close to 1 by around $x = 4$ or 5. But that is not seen to happen.

I am not sure whether the code is incorrect or I am mistaken in my understanding of how the distribution of order statistics behaves

Thanks everyone for guidance, I have realised what the problem is. The result for order statistics that I used for the code is applicable only for continuous distributions. For a discrete distribution like logarithmic series, this is not appplicable as there is a significant non zero probability of two observations being equal. Have found the correct approach in the below paper which I have implemented.

http://www.math.wm.edu/~leemis/2005informsjoc.pdf

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Srijith
  • 11
  • 2
  • 1
    I think your question would benefit by stating more of it in conventional mathematical notation (such as the pmf) ... before veering into code (it would also help by stating in the questing what software you are using) – wolfies Oct 31 '18 at 13:51
  • 2
    Could you explain what a "log series distribution" is? – whuber Oct 31 '18 at 17:57
  • http://mathworld.wolfram.com/Log-SeriesDistribution.html – Glen_b Nov 01 '18 at 01:17

0 Answers0