0

Repost of Mathemetics StackExchange question.

There are multiple doubts of mine associated around this theme:

  • In MLE, we try to find the PDF parameters ($\theta$) which maximise the likelihood of the observed data ($L(\theta | data)$). To get likelihood for a given data point for $\theta = \theta_1$ we simply evaluate the PDF for that data point. Now, we know that probability at any one particular point of a PDF is $0$. What is the correct reasoning behind evaluating the PDF at $x=x_1$ for its likelihood?

  • Clearly, the Sigmoid Function is not a PDF. But in the MLE estimates of Logistic Regression we see Sigmoid being used as if it is a PDF. Is my understanding correct ? If not, how to see it correctly? If yes, what is the reason behind it?

  • This is related to the previous question. I have seen at multiple places that people take the Sigmoid to infer probability. However there is not any constraint put to ensure that sum of all those probabilities must be $1$. What is the correct explanation behind it?

Aroonalok
  • 143
  • 4
  • How is the sigmoid being used as a PDF? Where do you see people using sigmoid to infer probability? – Dave Jul 06 '21 at 18:24
  • @Dave: https://arxiv.org/pdf/1402.3722.pdf View page 3. – Aroonalok Jul 06 '21 at 18:31
  • @Dave : https://www.dropbox.com/s/qiq2c85cle9ydb6/Chapter3.pdf?dl=0 View Page 7, last but 1th paragraph. – Aroonalok Jul 06 '21 at 18:39
  • These seem to be three very different questions, thematically linked together by the very broad question "Where does logistic regression come from?" I think the duplicates address this question. – Sycorax Jul 06 '21 at 19:13

0 Answers0