Softmax Layer Behavior

Question

I have a softmax layer; I.e. a weight matrix with no hidden layers before it that gets applied to the input, then the output gets passed to softmax. I'm wondering if this alone can be used (with gradient descent on the cross-entropy error) to learn any classification problems, or if a MLP hidden layer is basically a requirement between the softmax layer and the input layer. If I try on the pima-indians-diabetes dataset, it'll essentially descend to answering only one class out of the two (the 65% class of 0).

Also, my implementation is unable to learn the iris dataset.

I'd also like to learn how to gradient check the cross-entropy with numerical approximation.

This is more information than necessary, but I have the softmax layer connected to the output of a reservoir for reservoir computing. Also, I'm having trouble figuring out what the gradient would be of the cross entropy with respect to the hidden layer's input weight matrix.

Believe me, I've been searching around a lot on the internet and can't really find an answer to this particular set of questions. Any and all help appreciated!

A softmax layer of a neural network with no hidden layers is multinomial logistic regression, of which binary logistic regression is a special case. — Sycorax, Jun 24 '17 at 01:50
Yes, but I can't seem to get my implementation to learn either the iris or the pima-indians-diabetes dataset. Is that expected? Do I need to check my implementation (something I was trying to do anyways)? — H4x0rjax, Jun 24 '17 at 01:51
I'd try find a publication using that dataset and the same model and compare your results (i.e. reproduction) -- maybe you're doing everything right and the problem is just hard. Maybe there's a bug. Maybe you need to tune the network. But without some basis of comparison, you'll just be guessing. — Sycorax, Jun 24 '17 at 01:53
Yeah, That's what I've tried to do but other people seem to be able to solve iris with multinomial logistic regression at like 98%. I'm assuming I have a bug but I can't figure out how to implement a gradient check. — H4x0rjax, Jun 24 '17 at 01:54
Is your question primarily about how to code this or how to debug a program? If so, this is not the correct forum (see the help center for more information). — Sycorax, Jun 24 '17 at 01:57
No, my question is about the expected behavior of softmax by itself. I suppose the answer is it should behave like multinomial regression. Do I have to have a bias vector when doing multinomial regression? — H4x0rjax, Jun 24 '17 at 01:59
Let me ask a followup; would a multinomial regression with no hidden layer be able to classify the iris dataset? — H4x0rjax, Jun 24 '17 at 03:01
Possible duplicate of [What should I do when my neural network doesn't learn?](https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn) — Sycorax, Jul 07 '18 at 23:22

Softmax Layer Behavior

0 Answers0