Logistic regression produces well calibrated models. Is that true for neural nets trained in batches?

Question

This is an earlier discussion about LR producing well calibrated models:

Some people equate neural net based prediction models (even deep NN or deep+sparse NN) to be equivalent to logistic regression. We train them with adagrad (or some other methods) but always update weights by optimizing cost function on limited batch.

Q1: Is it true that Neural Nets have properties of logistic regression?

Q2: When we train with batches, weights that were calculated in the first batch and produce 'well calibrated' output, probably will be updated quite differently when processing subsequent batch. Intuitively, I think after a few updates this 'well-calibrated' property wouldn't hold (and especially for sparse neural nets where some embedding table is used). Is this correct?

A neural network with sigmoid activation and without hidden layers *is* a logistic regression. However, a simple counterexample shows that the logic used in the answer you linked does not apply to neural networks: Training a model on data with a low incidence rate without supplying weights often causes the neural network to predict all observations to have the same class. In such a case the sum of predicted class probabilities obviously does not equal the sum of the outcome. — Frans Rodenburg, May 03 '19 at 23:19
@FransRodenburg is it though? What about _probit_ regression, or _cloglog_? All of those link functions also have the _shape_ of a sigmoid (as do most CDFs), but why specifically logistic? Or was it because the OP asked about logistic? — runr, May 22 '19 at 08:54
@Nutle In neural networks, sigmoid almost always refers to the logistic function $\frac{1}{1 + e^{-x}}$, as opposed to other (possibly S-shaped) activation functions like softmax, tanh, ReLU, etc. — Frans Rodenburg, May 22 '19 at 11:54
@FransRodenburg Thanks. You're right, my mistake. It's interesting how in neural networks a specific case of a specific sigmoid (logit) function is called as the whole family of curves, but that's off topic. — runr, May 22 '19 at 12:29

Lerner Zhang · Answer 1 · 2019-05-30T13:08:54.810

I thought this may answer your question:

We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration.

But they found that temperature scaling can effectively calibrate predictions and in a new study some researchers found ways to calibrate the results of reinforcement learning. Since modern neural nets seem be trained all in batches so the aforementioned two papers are all about batch based training.

Reference: On Calibration of Modern Neural Networks

Logistic regression produces well calibrated models. Is that true for neural nets trained in batches?

1 Answers1