Logistic regression always predicting 1 for my small data set

Question

I have a data set of 8 rows out of which seven rows are predicted and I need to predict the 8th-row data input. I am not getting where my logic is getting wrong, please let me know where I am getting wrong.{what ever predicted label present before i am getting wrong labels for that too)

flag=0
while flag==0:
    cost=0
    for i in range(rows):
        if lDict.get(i)!=None:
            dp=dotproduct(w,x[i])
            #print("dotproduct",dp)
            sig=sigmoid(dp)
            #print("sigmoid",sig)
            deviation[i]=lDict.get(i)-sig
            #print("deviation",deviation)
            step1=lDict.get(i)*sig
            #print("step1",step1)
            step2=(1-lDict.get(i))*(1-sig)
            #print("step2",step2)
            cost=cost+(-step1-step2)
            print("cost",cost)
    for i in range(cols):
        for j in range(rows):
            if lDict.get(j)!=None:
                delf[i]=delf[i]+(deviation[j]*x[j][i])
                #w[i]=w[i]-(eta*delf[i])
                #print("new_w",w)
                print("delf[i]",delf[i])
                #delw[i]=delw[i]+eta*delf[i]
                #print("delw",delw)
    for i in range(cols):
        w[i]=w[i]-(eta*delf[i])
    print("updated w is ",w)


newcost=0
#print("newcost",newcost)
for i in range(rows):
    if lDict.get(i)!=None:
        #print("w",w)
        dp=dotproduct(w,x[i])
        #print("newdot",dp)
        sig=sigmoid(dp)
        #print("newsig",sig)
        step1=lDict.get(i)*sig
        #print("newstep",step1)
        step2=(1-lDict.get(i))*(1-sig)
        #print("newstep2",step2)
        cost=newcost
        newcost=newcost+(-step1-step2)

    print("new cost is",newcost)
    print("old cost",cost)
    if newcost-cost<stop:
        flag=1
        print("flag",flag)


for i in range(rows):
    if lDict.get(i)!=None:
        dp=sigmoid(dotproduct(w,x[i]))
        print("dp",dp)
        print("weights",w)
        #sig=sigmoid(dp)
        #print("sig",sig)
        if sig<0.5:
            print(0,"",i)
        else:
            print(1,"",i)`

You may be running afoul of the fact that [accuracy is not a good evaluation measure](https://stats.stackexchange.com/q/312780/1352). — Stephan Kolassa, Mar 03 '20 at 09:08

score 1 · Answer 1 · answered Mar 03 '20 at 02:58

1

This isn't completely out of the ordinary. Observe:

x = -2:3
y = c(1,1,0,1,1,1)

model = glm(y ~ x, family = binomial())

predict(model, type = 'response')
>>> 0.7551606 0.7919336 0.8244674 0.8528592 0.8773420 0.8982372

So if my decision rule is that I predict 1 whenever the predicted probability is larger than 0.5, then I would predict 1 for all of these observations. This doesn't mean the model is wrong, it just means that is what the model learns from the data.

answered Mar 03 '20 at 02:58

Demetri Pananos

24,380
1
36
94

Hi, I do agree that but it should not change the labels for the previous "trained dataset". suppose I have a data set [[0,0],[0,1],[1,0],[11,11]] – Dibyaranjan Mar 03 '20 at 04:56
That would imply the model would have a 0% error rate on training data, and logistic regression is not known to do that. – Demetri Pananos Mar 03 '20 at 05:01
in other terms, I need to have an optimal cost where my weights will come to a standstill position and with the help of this weight and data matrix along with sigmoid I need to predict for new data and for all the times my sigmoid is coming more than 0.5 are a results it is classifying all data to class1 – Dibyaranjan Mar 03 '20 at 05:19

Logistic regression always predicting 1 for my small data set

1 Answers1