How do we code a maximum likelihood fitting for a simple gaussian data?

Question

I am learning about Maximum Likelihood Estimation(MLE), What I grasped about MLE is that given some data we try to find the best distribution which will most likely output values which are similar or same to our original data.

I learn better by coding these concepts as programs. Using this answer I tried to code a simple Gaussian MLE. The data for this Gaussian MLE came from a known Gaussian distribution with known mean and standard deviation.

from scipy import stats
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
np.random.seed(1)

## x-axis for the plot
x_data = np.arange(-10, 10, 1)
print(x_data.shape)
## y-axis as the gaussian
y_data = stats.norm.pdf(x_data, 0, 3)

## plot data
plt.plot(x_data, y_data)
plt.show()
plt.close()


def gaussian(params):
    x0 = params[0]   
    sd = params[1]

    yPred = np.random.normal(x0,sd,size=20)

    # Calculate negative log likelihood
    LL = -np.sum( stats.norm.logpdf(y_data, loc=yPred, scale=sd ) )

    return(LL)


initParams = [1, 1]

results = minimize(gaussian, initParams, method='Nelder-Mead')
print (results.x)

estParms = results.x
yOut = yPred = np.random.normal(estParms[0],estParms[1],size=20)

plt.clf()
plt.plot(x_data,y_data, 'go')
plt.plot(x_data, yOut)
plt.show()

The output parameters for mean and standard deviation are [1.02109375 1.02968749] respectively, which are wrong. I believe these two lines in guassian() function is a wrongly implemented by me.

    yPred = np.random.normal(x0,sd,size=20)

    # Calculate negative log likelihood
    LL = -np.sum( stats.norm.logpdf(y_data, loc=yPred, scale=sd ) )

How do we implement a maximum likelihood fitting for this simple gaussian data?

score 2 · Answer 1 · answered Jan 08 '21 at 12:08

It looks like you are approaching this by generating a grid of x values, computing known PDF values, and then trying to find parameters such that the estimated PDF matches the known PDF. MLE does not work this way because in practice we do not have the known PDF values. Below, I show how MLE is usually done, then also how to use known PDF values if you do somehow have them.

Maximum-Likelihood estimation

In practice, we typically have sample $x$ values, not a grid. For your exercise, you want to sample $N$ values from the Gaussian: \begin{align} x_i & \sim \mathcal{N}(x_i | 0, 3) & i\in 1, \ldots, N \end{align} and then minimize the negative log likelihood of the samples: \begin{align} \mu^*, \sigma^* = \underset{\mu, \sigma}{\arg\min} -\sum_i\ln \mathcal{N}(x_i | \mu, \sigma) \end{align}

In code for $N=20$:

from scipy import stats
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
np.random.seed(1)

n = 20

sample_data = np.random.normal(loc=0, scale=3, size=n)

def gaussian(params):
    mean = params[0]   
    sd = params[1]

    # Calculate negative log likelihood
    nll = -np.sum(stats.norm.logpdf(sample_data, loc=mean, scale=sd))

    return nll


initParams = [1, 1]

results = minimize(gaussian, initParams, method='Nelder-Mead')
print(results.x)

For $N=20$, this finds $\mu^* \approx -0.4$ and $\sigma^* \approx 3.3$, but the estimates improve with larger sample sizes.

Function minimization with known PDF

If you do happen to have known PDF values $y$, then what you want to do is compute the known PDF and then minimize the difference between the known pdf and the pdf from your estimate. One option just minimizes the mean squared error: \begin{align} y_i & = \ln \mathcal{N}(x_i | \mu_{\mathit{known}}, \sigma_{\mathit{known}}) & i\in 1, \ldots, N \\ \mu^*, \sigma^* & = \underset{\mu, \sigma}{\arg\min} \frac{1}{N} \left[\sum_i\ln \mathcal{N}(x_i | \mu, \sigma) - y_i\right]^2 \end{align}

In code:

from scipy import stats
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
np.random.seed(1)

n = 20

x_min = -10
x_max = 10
step_size = (x_max - x_min) / n

## pick some x-values and compute true log PDF
x_data = np.arange(x_min, x_max, step_size)
known_log_pdfs = stats.norm.logpdf(x_data, 0, 3)

def gaussian(params):
    mean = params[0]   
    sd = params[1]

    estimated_log_pdfs = stats.norm.logpdf(x_data, loc=mean, scale=sd)
    
    mse = ((estimated_log_pdfs - known_log_pdfs) ** 2).mean()

    return mse

initParams = [1, 1]

results = minimize(gaussian, initParams, method='Nelder-Mead')
print(results.x)

With known PDF values, this produces highly accurate parameter estimates with as few as three grid points.

score 1 · Answer 2 · answered Jan 08 '21 at 06:45

What I grasped about MLE is that given some data we try to find the best distribution which will most likely output values which are similar or same to our original data.

I don't think this is the best way to explain MLE. We try to find the parameters of a distribution that best explain our observed data, such that we can sample similar data from this distribution.

I explain in detail how perform MLE using Gaussian data here. This tutorial explains how to perform MLE analytically and using gradient descent. If you have Jupyter Notebook installed, I would suggest you download the .ipynb file and read through it locally.

score 1 · Answer 3 · answered Jan 08 '21 at 07:26

Your loglikelihood function is wrong.

For $X\sim N(\mu,\sigma^2)$ you would want

LL = -np.sum( stats.norm.logpdf(y_data, loc=mu, scale=sigma ) )

You have additional randomness from yPred. There are two problems with this. First, it's not the loglikelihood you want. Second, your loglikelihood function is random; it doesn't return the same output for the same input. That's going to confuse the optimiser.

How do we code a maximum likelihood fitting for a simple gaussian data?

3 Answers3

Maximum-Likelihood estimation

Function minimization with known PDF