I am watching the lecture here and the author says that in the machine learning setting where data is assumed to be generated by a model with a few parameters:
Max Likelihood parameters are NP hard to find in general (non convex objective).
And in practice heuristics such as EM are used.
I guess everywhere these things are stated as a metter of fact but explanations are missing.
Could you explain how can a model have an intractable max likelihood? How to prove it. I have seem problems such as SAT proven NP hard, but in this context I do not have an intuition of how to approach. Do we need first to show non-convexity before proving NP hard?