Let $x_0, x_1, x_2, \ldots, x_n$ be our features, and let $y$ be the target variable.
With linear regression, our hypothesis is: $$h_{\theta}(x) = \sum_{i=0}^{n} \theta_{i} x_{i}$$ where $x_0 = 1$.
Now, with logistic regression, the hypothesis is: $$h_{\theta}(x) = \frac{1}{1 + e^{-\theta^{T}x}}$$ I have a few questions:
Are we simply using the hypothesis from linear regression, plugging it in to the sigmoid function and then that is our new hypothesis for logistic regression? So we're still assuming that the result is a linear combination of the features (before substituting in to the sigmoid function)?
Where does probability come in to this? I've seen that $$P(y = 1 | x; \theta) = h_{\theta}(x)$$Where does this come from? Of course, it's plausible since we are in the range $[0,1]$, but I don't understand how this sigmoid function outputs the probability of the target variable belonging to class 1?