1

I have a question like this, $X \sim N(\mu,\sigma^2)$ with unknown parameters. Now, a sample of size $m$ generated from X, but filter by X < T, i.e., any number larger than T will be ignored and continue until we have $m$ samples.

How to estimate $\mu$ and $\sigma$?

I suppose we could use E-M algorithm? Since this is similar to the case of missing data.

This is what I've come up so far:

$L(\mu,\sigma^2;x_1,..,x_m,x_{m+1},..x_n) = \prod_i^m f(x_i) \prod_{m+1}^n (1-\Phi(T))$
$Log L(\mu,\sigma^2;x_1,..x_m,x_{m+1},..x_n) = \sum_i^m log(f(x_i)) + \sum_{m+1}^n log(1-\Phi(T))$

Then how to take derivative of the Log likelihood with respect to $\mu$ and $\sigma^2$, also $n$ is unknown, need to be estimated here?

Demo
  • 53
  • 8
  • Can you write out the likelihood function for the observed data? That's a start... – jbowman Mar 16 '18 at 04:04
  • 1
    You can use a maximum likelihood approach, using the probability density function given [here](https://en.wikipedia.org/wiki/Truncated_normal_distribution) – matteo Mar 16 '18 at 08:08
  • @MatteoLisi Then use numerical method to find MLE? – Demo Mar 16 '18 at 14:57
  • @jbowman I tried, but still stuck here.. – Demo Mar 16 '18 at 15:05
  • 1
    Use the method from: https://stats.stackexchange.com/questions/133347/ml-estimate-of-exponential-distribution-with-censored-data/133360#133360 – kjetil b halvorsen Mar 16 '18 at 15:17
  • @kjetilbhalvorsen but here the little n is unknown, with censored data we have all the data, it’s just some are censored... – Demo Mar 16 '18 at 15:21
  • How can the little $n$ be unknown after you have the data? – kjetil b halvorsen Mar 16 '18 at 15:23
  • @kjetilbhalvorsen good point... I was fooled by myself. Will update answer myself – Demo Mar 16 '18 at 15:25
  • If you know $n$ then you have *censored* data, not truncated data, and methods for estimated parameters have appeared in many threads here: search on "censored" or, separately, "survival." – whuber Mar 16 '18 at 15:33
  • I am confused.. how can you know $n$ if all you have is $m$ samples? My understanding of the question was that in addition to the samples you know only the upper boundary T of the distribution. – matteo Mar 16 '18 at 17:36
  • @MatteoLisi This could fall in two cases, whether you count the discarded samples or not during your data generation process. If this number is recorded, this is like censored data, typical survival analysis should be good; otherwise this is missing data problem, E-M algorithm will be good. This is my understanding now. – Demo Mar 17 '18 at 13:38

0 Answers0