I could not get the Maximum likelihood cost function right

Question

I am very new to machine learning and coming from a different background.

I am trying to visualize the classification problem (with two classes) for better understanding the mechanism behind the machine learning.

Taking a simple example as follows:

dataset (input and output) ={{1, 0}, {2, 0}, {3, 0}, {5, 1}, {7, 1}, {8, 1}, {20, 1}, {25, 1}} with one feature only.

when I used the Sigmoid function \begin{align} h(X)= \frac{1}{1+ e^{-W^TX}} \end{align}

and normal Norm cost function

\begin{align} J(W)=\frac{1}{2m} \| h(X)-Y \| ^2\ \end{align}

and then solve for the minimum cost, I got a good solution for W with a good fit as shown in the following picture:

Now I am trying to understand the Cost function given by:

\begin{align} J(W)= -\frac{1}{m}\sum_{i=1}^m ( y_{i}log(h( x_{i} ))+(1- y_{i} )(1-log(h( x_{i} )) ) \end{align}

When I visualize this function to see if it is convex or not I just got a weird shape with no convexity. Moreover, when I minimize it to get the values of W I got values for W which results in a bad fit for the data. Also every time I minimize the loss function with different initial values of W I always got different results for W.

Can you please let me know what did I do wrong to get these results?

I would like to see a 3D plot of the Cost function if possible to see the convexity of the Cost function.

Update

This is what I did: I prepared the data and prepared the h and Sigh functions as follows:

XY={{1, 1, 0}, {1, 2, 0}, {1, 3, 0}, {1, 5, 1}, {1, 7, 1}, {1, 8, 1}, 
    {1,20, 1}, {1, 25, 1}};
m=Length[XY];

   ClearAll[h, Sig, Sigh]
    h[x0 : 1, x_] = {x0, x}.{w0, w1}
    Sig[z_] = 1/(1 + E^-z)
    Sigh[x0 : 1, x_] = Sig[h[1, x]]

Then I prepared the cost function as follows:

ClearAll[Costfun]
Costfun[x0 : 1, x_,y_] = -y Log[Sigh[x0, x]] - (1 - y) (1 - Log[Sigh[x0, x]])
J[w0_, w1_] = 1/m Total[Costfun @@@ XY]

This is what I got (the plot is also included)

When I minimize the cost function I could not get correct values of W and this the fitting answer:

The cost function in your last equation would be correct if $h(x_i)$ provided the estimated probability for case $i$. A quick look at your formula for $h(X)$ suggests that it doesn't and that the numerator should be other than 1. If that fixes your problem, please write that up as a solution to guide others who might come to this site with a similar question. — EdM, Feb 21 '20 at 17:02
@Tim, I updated the question. The code is in Wolfram language. — Basheer Algohi, Feb 21 '20 at 17:27
Logistic regression is not convex when perfect separation is present. https://stats.stackexchange.com/questions/326350/what-is-happening-here-when-i-use-squared-loss-in-logistic-regression-setting — Sycorax, Feb 21 '20 at 18:31
@SycoraxsaysReinstateMonica, how should I modify my example to get the convex cost function. Thanks for the help — Basheer Algohi, Feb 21 '20 at 19:31
The [tag:separation] tag has a number of threads about this. I recommend sorting by votes. — Sycorax, Feb 21 '20 at 19:35

I could not get the Maximum likelihood cost function right

0 Answers0