Maximum likelihood estimator of Cauchy distribution but with a catch

Question

I have an exercise to solve that states that we need to find the Maximum likelihood estimator of location parameter of the Cauchy distribution given a set X={x1,x2} , |x1-x2|<2

Now I worked using this approach: Maximum likelihood estimator of location parameter of Cauchy distribution

but I can't get to grasp how " given a set X={x1,x2} , |x1-x2|<2" changes the problem.

Thank you

The meaning isn't at all clear. Are you quoting the exercise *exactly* and *in its entirety?* — whuber, Jan 31 '22 at 18:50
The exercise isn't in English so I translated but it, but yeah that's all it says, it also gives the formula of the cauchy distribution that is also in the other link. The extra part says: "find the MLE based on a data set of X = {x1,x2}, |x1-x2| <2 — Giannhs Meh, Jan 31 '22 at 18:53
It is crucially important to explain that the $x_i$ are the dataset. There still remain several different ways to interpret the constraint $|x_1-x_2|\lt 2.$ — whuber, Jan 31 '22 at 18:58
I wish i understood but that's all it says... I think what it really means is that the sum doens't go from 1 to infinity but it only goes from 1 to 2, hence the X= {x1,x2}. This way it can be solved for normally when calculating the derivative = 0 without using the Newton-Raphson method. — Giannhs Meh, Jan 31 '22 at 19:02

Sextus Empiricus · Accepted Answer · 2022-01-31T20:59:39.297

10

When you have only two samples $x_1,x_2$ then the likelihood is maximized by finding the maximum of

$$\mathcal{L}(\lambda ; x_1,x_2) \propto\frac{1}{(\gamma^{2}+(x_1-\lambda)^2)}\frac{1}{(\gamma^{2}+(x_2-\lambda)^2)}$$

Which is equivalent to finding the minimum of the polynomial

$$(\gamma^{2}+(x_1-\lambda)^2)(\gamma^{2}+(x_2-\lambda)^2))$$

We can rephrase this in terms of $\bar{x} = \frac{x_1+x_2}{2}$ and $d= \frac{x_1-x_2}{2}$

$$(\gamma^{2}+(\bar{x}-d-\lambda)^2)(\gamma^{2}+(\bar{x}+d-\lambda)^2)$$

without loss of generality we can set $\bar{x}=0$ (it will just shift the solution if $\bar{x}\neq0$)

$$(\gamma^{2}+(d-\lambda)^2)(\gamma^{2}+(d-\lambda)^2) = \gamma^4 + 2 \gamma^2 (d^2+\lambda^2) + (d^2+\lambda^2)^2 - 4 d^2\lambda^2$$

Whose derivative to $\lambda$ is equal to $0$ in the minimum

$$4(\gamma^2 - d^2+\lambda^2) \lambda = 0$$

In the case of $d^2 < \gamma^2$ then this is equal to zero in only a single point $\lambda = 0$ (or in the general case in $\lambda = \bar{x}$).

Your problem probably uses $\gamma = 1$ such that we have as condition $|d| < 1$ or equivalent $|x_1-x_2| < 2$. For this condition the likelihood is minimised in $\lambda = \frac{x_1+x_2}{2}$. For other cases you have multiple minima.

but I can't get to grasp how " given a set X={x1,x2} , |x1-x2|<2" changes the problem.

So what changes with the condition $|x_1-x_2| < 2$ is that the likelihood function has a single minimum.

edited Jan 31 '22 at 20:59

answered Jan 31 '22 at 19:38

Sextus Empiricus

43,080
1
72
161

Why doesn't $x_2$ appear in your expression? Why does $\lambda$ appear on both sides? How does this interpret and handle the condition in the question? – whuber Jan 31 '22 at 19:39
1

@whuber I am doing this answer on my phone and had to create a 'safe' of my partial answer. Re-typing equations is annoying. – Sextus Empiricus Jan 31 '22 at 19:46
Very nice interpretation and solution! +1. – whuber Jan 31 '22 at 22:19
@Sextus Great approach! What I thought might be possible in order to get to a solution with my stupid mind is: when you take the general approach of using the ln of the function and calculating its derivative such as you do for any MLE problem and get a sum of 1 to n, to take the sum of 1 to 2 since you have 2 Xs(x1,x2) and find for what θ it is equal to a zero. Do you think this would still be a valid solution? – Giannhs Meh Jan 31 '22 at 23:42
@GiannhsMeh if you maximize the logarithm of the likelihood function then you should arrive at the same situation, except with some different algebraic steps. $$\frac{d\ln L}{d\mu}=\frac{2(x_1-u)}{1+(x_1-u)^2}+\frac{2(x_2-u)}{1+(x_2-u)^2} = 0$$ After getting the denominators equal, you get the same situation as in my answer. – Sextus Empiricus Feb 01 '22 at 00:01
@SextusEmpiricus That's great, thank you so much – Giannhs Meh Feb 01 '22 at 00:06
actually I am not so sure, you would have to work it out. But you can use the same trick of converting to $x_1 = \bar{x} + d$ and $x_2 = \bar{x} - d$ and setting $\bar{x} = 0$. Or alternatively you can set $x_1 = 0$ – Sextus Empiricus Feb 01 '22 at 00:07
@SextusEmpiricus Yeah the math is a pain in the ass to work with in this one but I guess your approach is way more elegant! – Giannhs Meh Feb 01 '22 at 00:09
@SextusEmpiricus Sorry but I just noticed that the cauchy distribution has an 1/π term multiplied by the L function you inputed at least in my problem. Does this modify the solution in any way? – Giannhs Meh Feb 01 '22 at 00:30
@GiannhsMeh the likelihood function is independent of multiplicative constants. I eliminated it from the likelihood function and used the symbol $\propto$ to indicate that the relationship is a proportionality relationship. – Sextus Empiricus Feb 01 '22 at 00:32
how would you explain this constraint $|x_1-x_2|<2$ intuitively? – Aksakal Feb 01 '22 at 19:42
2

@aksakal For $|x_1-x_2|>2$ you get that the location parameter is more likely nearby either $x_1$ or $x_2$ than in the middle. This happens when the change of the loglikelihood is larger for moving towards the parameter than for moving away from the parameter. – Sextus Empiricus Feb 01 '22 at 19:57
@SextusEmpiricus why do you think it's intuitive? – Aksakal Feb 01 '22 at 19:58
@Aksakal It is in this shape https://www.wolframalpha.com/input?i=log%281%2F%281%2Bx%5E2%29%29 and the sum of two of those https://www.wolframalpha.com/input?i=log%281%2F%281%2Bx%5E2%29%29+%2B+log%281%2F%281%2B%28x-3%29%5E2%29%29 – Sextus Empiricus Feb 01 '22 at 20:04
@SextusEmpiricus, I'd call this a *spontaneous symmetry breaking*: when the distance between the points is large enough it's more likely that the center is not between the points, and it must happen to heavy tailed shapes – Aksakal Feb 01 '22 at 20:12
If the log-likelihood function f(x) has a peak in x=0, but if it would be concave then it would not have been possible for a sum f(x)+(x+a) to have a global maximum different from a/2. The loglikelihood function of the Cauchy distribution is not concave. – Sextus Empiricus Feb 01 '22 at 20:20

Aksakal · Answer 2 · 2022-02-01T20:22:23.280

From symmetry principle location can only be $\frac{x_1+x_2} 2$, when there is a unique solution.

this is regardless of the estimation approach. For instance, a particular estimator, such as MLE, may not even exist in every case. However, if it does exist and it is unique, then the symmetry dictates what should it be.

non MLE estimation

If we don't constrain ourselves with MLE, then we could get the parameters differently. Obviously, the peak (center) is still the same half-way between $x_1$ and $x_2$. So, we only need to figure the other parameter.

In physics we used to characterize Breit-Wigner function with half-width at half-height parameter, which is equal to a shape parameter $\gamma$ of Cauchy distribution definition in statistics: PDF reaches its half height when you step left or right by half-width from the peak. So, naturally, the "width" is $2\gamma$ and intuitively it should be estimated as $2\hat\gamma=|x_1-x_2|$.

The "height" of PDF is at its peak $x_0$:$$H=\frac 1 {\pi\gamma}$$ Solve the PDF for $x$ to equal $H/2$: $$\frac H 2=\frac 1 {2\pi\gamma}=\frac 1 {\pi\gamma\left[1+\left(\frac{x-x_0} \gamma\right)^2\right]}$$ to get $$\frac{x-x_0}\gamma=1$$ So when the variable is $x_0\pm\gamma$ the PDF reaches its half-height

@Aksakal, that's right for *an* estimate. But it is not a right argument for the maximum likelihood estimator. The maximum likelihood estimator does not exist for $|x_1-x_2|>2$, so the symmetry principle should state that the estimator can only be $\frac{x_1+x_2}{2}$ *if* the estimator exists. — Sextus Empiricus, Feb 01 '22 at 06:26
Symmetry directly implies only that $(x_1+x_2)/2$ is a critical point of the likelihood: more is needed to demonstrate it is a global minimum (or even a local minimum). — whuber, Feb 01 '22 at 14:29

score 0 · Answer 3 · answered Jan 31 '22 at 19:06

To come up with an expression for the conditional likelihood, you express the joint likelihood $L(x_1, x_2)$ as $L(x_2|x_1)L(x_1)$. The unconstrained $x_1$ can be anything of course, so has the usual likelihood

$$f(x_1) = \frac{1}{\pi (1+(x_1-\mu)^2)}$$.

The conditional $x_2$ density can be expressed by assigning likelihood 0 to any value more than 2 units beyond $x_1$, then the density can be "normalized" by the density area within the 2 units, i.e.

$$ f(x_2|x_1) = \dfrac{f(x_2)}{F(x_1+2) - F(x_1 - 2)} \times \mathcal{I}(|x_1-x_2|<2)$$

I am not sure if the expression $L(x_1, x_2) = f(x_2|x_1) f(x_1)$ has an analytic solution or if it's really necessary to obtain one. But numeric solvers should give you a solution here, and some simulations should show you if the estimator has any nice asymptotic properties.

I cannot make sense of this, because your notation doesn't treat $\mu$ as the variable, while that is *essential* for MLE. — whuber, Jan 31 '22 at 19:40

Maximum likelihood estimator of Cauchy distribution but with a catch

3 Answers3

non MLE estimation