What is the difference between Maximum Likelihood Estimation & Gradient Descent?

Question

What are the pro & cons of both the methods?

I am not looking for just definition of these two methods which I already have from Google search. I am trying to understand which method is preferred in which case. Eg: For Bigdata will one work better than other etc. I couldn't find any good material which talks about the practical aspects etc. — GeorgeOfTheRF, Nov 27 '15 at 14:42
@ML_Pro GD does not relate anyhow to statistical modeling, it is an algorithm. You probably could start with some introductory statistics handbook to get a better understanding of statistical inference before going into learning the *tools* (like GD) for solving statistical problems. — Tim, Nov 27 '15 at 15:53
Did you mean to ask the difference between *Gradient Descent* and *Expectation Maximization* (which is typically used to solve the optimization problem in MLE)? — Sobi, Dec 13 '15 at 00:50

Tim · Accepted Answer · 2018-06-19T17:58:14.820

43

Maximum likelihood estimation is a general approach to estimating parameters in statistical models by maximizing the likelihood function defined as

$$ L(\theta|X) = f(X|\theta) $$

that is, the probability of obtaining data $X$ given some value of parameter $\theta$. Knowing the likelihood function for a given problem you can look for such $\theta$ that maximizes the probability of obtaining the data you have. Sometimes we have known estimators, e.g. arithmetic mean is an MLE estimator for $\mu$ parameter for normal distribution, but in other cases you can use different methods that include using optimization algorithms. ML approach does not tell you how to find the optimal value of $\theta$ -- you can simply take guesses and use the likelihood to compare which guess was better -- it just tells you how you can compare if one value of $\theta$ is "more likely" than the other.

Gradient descent is an optimization algorithm. You can use this algorithm to find minimum (or maximum, then it is called gradient ascent) of many different functions. The algorithm does not really care what is the function that it minimizes, it just does what it was asked for. So with using optimization algorithm you have to know somehow how could you tell if one value of the parameter of interest is "better" than the other. You have to provide your algorithm some function to minimize and the algorithm will deal with finding its minimum.

You can obtain maximum likelihood estimates using different methods and using an optimization algorithm is one of them. On another hand, gradient descent can be also used to maximize functions other than likelihood function.

edited Jun 19 '18 at 17:58

answered Nov 27 '15 at 12:29

Tim

108,699
20
212
390

Can you please explain what is likelihood and the math behind it? – GeorgeOfTheRF Nov 27 '15 at 14:31
5

@ML_Pro I provided two links where you can find detailed information, I don't think there is a need to duplicate these answers. – Tim Nov 27 '15 at 14:34
8

@ML_Pro as I wrote in my answer, they are **different things** and you cannot compare them... – Tim Nov 27 '15 at 14:45
7

Yes but MLE is a general approach and GD is just an algorithm you can use to minimize a number of different functions. It is like you compared algebra to pocket calculator... – Tim Nov 27 '15 at 14:56
@ML_Pro but this is a *different* question then you initially asked... The one that you asked was very general, while this one is very specific one. I provided answer for the initial question - also see my edit, hope it makes things more clear. – Tim Nov 27 '15 at 15:33
4

MLE specifies the objective function (the likelihood function); GD finds the optimal solution to a problem once the objective function is specified. You can use GD (or other optimization algorithms) to solve a maximum likelihood problem, and the result will be the maximum likelihood estimator. – jbowman Nov 27 '15 at 15:58
@Tim Can also pls explain how likelihood function can be obtained? Why is it product of pdfs? – GeorgeOfTheRF Dec 06 '16 at 17:33
1

@ML_Pro this is described in the links I provided in my answer. In short: yes it is a product of pdf's. Product because we assume that the data is i.i.d. It is defined in terms of pdf's because we are talking about probability model. – Tim Dec 06 '16 at 17:46

score -3 · Answer 2 · answered Dec 01 '16 at 12:58

-3

Usually, when we get likelihood function $$f = l(\theta)$$, then we solve equation $$\frac{ df }{ d\theta } = 0$$.

we can get the value of $$\theta$$ that can give max or min value of f, done!

But logistic regression's likelihood function no closed-form solution by this way. So we have to use other method, such as gradient descent.

answered Dec 01 '16 at 12:58

Belter

113
6

@Tim, you can see something from here, https://courses.cs.washington.edu/courses/cse446/13sp/slides/logistic-regression-gradient.pdf – Belter Dec 01 '16 at 13:10
"The regression coefficients are usually estimated using maximum likelihood estimation" (https://en.wikipedia.org/wiki/Logistic_regression) – Tim Dec 01 '16 at 13:20
Maximum likelihood estimation do is a kind of method for estimating regression coefficients, but we have several ways to find the solution of MLE. So using `likelihood function` + `gradient descent`(which to get the solution of likelihood function) is still a way to do MLE. – Belter Dec 01 '16 at 13:56
You also can see this sentence `Unlike linear regression, we can no longer write down the MLE in closed form. Instead, we need to use an optimization algorithm to compute it. For this, we need to derive the gradient and Hessian.` from Machine Learning: a Probabilistic Perspective , Kevin Murphy. – Belter Dec 01 '16 at 13:58
...then wording of your answer is confusing as it sounds like you are saying that for logistic regression we are not using ML and instead we use GD. – Tim Dec 01 '16 at 13:59
In my opinion, GD is smaller than MLE, **MLE = Likelihood Function + GD**, GD is only a way to calculate `arg min/max` of likelihood function. – Belter Dec 01 '16 at 14:03
Fine and I agree, but this does not follow from your answer. If you'll edit your answer for making it more clear I'd be happy to remove my (-1) vote. – Tim Dec 01 '16 at 14:04
Please help me clarify it, I am not native English speaker... – Belter Dec 01 '16 at 14:07
I think you can simply add what you've wrote in the comments expanding it a little bit ;) – Tim Dec 01 '16 at 14:09
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/49435/discussion-between-belter-and-tim). – Belter Dec 01 '16 at 14:15

What is the difference between Maximum Likelihood Estimation & Gradient Descent?

2 Answers2

Linked

Related