I am trying to obtain the probability mass function for various order statistics of a log series distribution for a given $n$. To do so, I tried modifying the code given in this question: Simulating order statistics (as well as using the dbinom
function).
My modified code is given below
f_order_statistic <- function(p,n,k,x) {
Output <- lfactorial(n) - lfactorial(k-1) -
lfactorial(n-k) + (k-1)*extraDistr::plgser(x,p, log =TRUE) +
(n-k)*extraDistr::plgser(x,p, log=TRUE, lower.tail=FALSE) +
extraDistr::dlgser(x,p, log=TRUE)
return(exp(Output))
}
However, I am not sure if my answers are correct. For example, for $n = 100$ (sample size) and $k = 100$th (order statistic), I believe the pmf should be concentrated around $x = 2$.
I am getting the following values f(1) = 0.042, f(2) = 0.289 and f(3) onwards are very small (lower than 10 raised to -5). Essentially it is not adding up to 1. Am I making a mistake in my code somewhere or is my understanding that the summation of f(x) (from 1 to infinity) should add up to 1 incorrect.
To clarify the points raised by wolfies - I am using R to generate the code.
The problem I am trying to solve with my code is that I want to find the distribution of a particular order statistic (eg: kth order statistic) for a sample of n iid observations drawn from a population that has a logarithmic series distribution. The code seeks to generate the pmf for the order statistic for a given value of x. Therefore, if I I generate it for all values of x, it should add up to 1. The code seeks to implement the standard order statistic formula of $$ f(x) = n! / ((k - 1)! * (n - k)!) * f(x) * (F(x))^(k - 1) * (1 - F(x))^(n - k) $$ For example I believe that for $n = 200, k = 100$, the cumulative function (obtained by summing up the pmfs for different values of x) to reach close to 1 by around $x = 4$ or 5. But that is not seen to happen.
I am not sure whether the code is incorrect or I am mistaken in my understanding of how the distribution of order statistics behaves
Thanks everyone for guidance, I have realised what the problem is. The result for order statistics that I used for the code is applicable only for continuous distributions. For a discrete distribution like logarithmic series, this is not appplicable as there is a significant non zero probability of two observations being equal. Have found the correct approach in the below paper which I have implemented.