The function learned by a Deep Neural Network is essentially composition of different functions. For ex. in CNN first function is convolution (linear function), max-pooling (convex function) followed by non-linearity (sigmoid, which is convex) and so on. The basic functions are convex than how come their composition of non-convex?
I read in convex optimization that composition of convex functions is convex, they why do we always say that DNN's have non-convex energy?