What concepts in optimization do I need for machine learning?

Question

I am a Math/CS dual major. As part of my math major, I have the option of taking optimization and mathematical programming classes. I am also interested in machine learning. I know that a lot of machine learning algorithms are theoretically grounded in optimization techniques. I have to decide which optimization classes to take. What optimization concepts should I make sure I cover? Some examples of topics in the classes: gradient descent, conjugate gradient descent, BFGS, KKT, simplex method, ellipsoid method, golden section, knapsack problems, SDP, SOCP, Barrier methods, Mehrotra Predictor-Corrector.

What topics do I need to know?

Could someone answer? I need to choose classes relatively soon.

For sure: gradient descent. Worthwhile: simplex method, KKT. The Knapsack problem belongs to combinatorial optimization. I would not immediately associate combinatorial optimization with machine learning. (EDIT: the internet disagrees with me). It is a very interesting subject. I recommend looking into it. — workingonit, Apr 15 '17 at 21:13
Convex optimization is useful (many of the topics you mention are special cases). Also, integer optimization starts being used in statistics so maybe also in machine learning ... — kjetil b halvorsen, Apr 15 '17 at 21:56
I wouldn't take the whole class on optimization unless that's what you're interested in. Otherwise conceptually it's just finding the minimum of some function, there's nothing particularly interesting in it for statistics and ML per se. — Aksakal, Apr 16 '17 at 01:28
I am a CS Math Dual Major. I have to take a certain number of math classes. I might as well take one's that relate to CS — Halbort, Apr 16 '17 at 03:10
One approach is to take some more advanced book covering ML/Stats topics **you are interested in**, and see what topics are covered in its "remedial/summary/pre-req" material (usually this is in an [intro chapter](http://www.deeplearningbook.org/contents/numerical.html) and/or an [appendix](http://szeliski.org/Book/drafts/SzeliskiBook_20100903_draft.pdf)). — GeoMatt22, Apr 17 '17 at 22:44
@GeoMatt22 I tried that. However, the preliminaries chapters mainly go over multivarible calculus, linear algebra and basic optimization methods. Any more advanced optimization methods (of which a lot are used) are explained in the middle of intermediate chapters. I could try looking through many books cover to cover and list out what optimization techniques are used, but I do not know which ones to prioritize as I am just starting. — Halbort, Apr 17 '17 at 22:54
As you are at an institution, probably the Math and/or CS dept. has an "undergrad advisor" person. An in person conversation with one or both of these is probably a better bet than asking here (also in the context of your graduation requirements & learning goals ... i.e. solving the real-world "knapsack problem" facing you!) — GeoMatt22, Apr 17 '17 at 23:05

score 5 · Accepted Answer · edited Mar 11 '20 at 15:38

5

Conceptually, the only thing you need to know to understand machine learning algorithms is "there is an optimum, and we can find it". Practically, it's always useful to have some idea how optimization is happening "under the hood". At very least, it will give you some insight into how the the performance and storage requirements of your ML algorithms are likely to scale with data size and dimension, and under what circumstances you are likely to run into problems. Of course, optimization is a rich and interesting area of its own, which will exercise your brain and round out your CS education even if you never do machine learning.

As requested in a comment, here is a list of important topics in optimization:

Continuous optimization
- Toy algorithms: gradient descent, simplex method
- Powell's method
- BFGS
- Model trust methods
Stochastic optimization
- Simulated annealing
- Genetic algorithms and swarm methods
- Stochastic gradient descent
Constrained optimization
- Barrier methods
- Linear programming (with interior-point methods)
- Integer programming
Dynamic programming

edited Mar 11 '20 at 15:38

dourouc05

107
5

answered Apr 17 '17 at 23:30

David Wright

2,181
12
12

I am trying to decide which optimization classes to take based on their syllabi. Could you provide a top 10 list of specific optimization topics that I should know? – Halbort Apr 17 '17 at 23:36
5

All deep learning is nowadays done with gradient descent. Saying that it's not "practically useful" is preposterous. – amoeba Apr 19 '17 at 11:41
1

I'd like to chime in and underscore what @amoeba said. There was definitely a period where interior point methods and quasi-Newton methods were the standard, but nowadays, with the size of data, simple gradient descent and variants is back in focus. – Mustafa Eisa Apr 22 '17 at 05:23
1

@MustafaSEisa Yes. I downvoted this answer now, because I think it's very importantly misleading statement. – amoeba Apr 22 '17 at 08:47
1

Re: gradient descent & machine learning, the more appropriate Wikipedia article would be [this](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). (See also [here](https://stats.stackexchange.com/questions/235862/is-it-possible-to-train-a-neural-network-without-backpropagation).) – GeoMatt22 Apr 22 '17 at 15:01
@amoeba: I removed the paragraph to which you objected; it's not really central to the question. I do still disagree with your assertion. The larger the data set, the more expensive the evaluation of the objective function, and therefore the larger the advantage of sophisticated methods over gradient descent. What might drive one to return to gradient descent is large dimensionality, since sophisticated methods are typically d^2 in storage. – David Wright Apr 22 '17 at 20:40
Note @amoeba said "deep learning" rather than "big data" ... the dimensionality tends to be big in deep learning (many weights, i.e. edges in the network topology) – GeoMatt22 Apr 22 '17 at 20:48
1

Okay, I removed my downvote, but you are still calling gradient descent a "toy algorithm" in your list. Seriously, there is nothing "to disagree" about here; you are probably unfamiliar with modern deep learning. Look it up. Neural network that famously won over best human in Go last year (https://en.wikipedia.org/wiki/AlphaGo) was trained with [stochastic] gradient descent (http://web.iitd.ac.in/~sumeet/Silver16.pdf), like pretty much all other deep networks nowadays. This is just absolutely *not* a toy algorithm. (cc to @GeoMatt22) – amoeba Apr 22 '17 at 21:39

What concepts in optimization do I need for machine learning?

1 Answers1