0

I am new to binomial regression so this question may seem very basic. So I have data that fits for binomial regression (each row has n successes and m failures). Since I am more familiar with logistic regression so I thought I would expand my matrix so that each row is either 1 for success or 0 for failure. My code is below

import numpy as np
from sklearn.linear_model import LogisticRegression

#Function to construct matrix compatible for logistic regression. Could be cleaner
def expand_mat(onehot_X, onehot_Y):
  count_row = 0
  for i in range(onehot_X.shape[0]):
    count_row += int(sum(onehot_Y[i,:]))
  oh_X = np.zeros((count_row,onehot_X.shape[1]))
  oh_y = []
  last = -1
  for i in range(onehot_X.shape[0]):
    if int(sum(onehot_Y[i,:]))<1:continue
    new_X = np.array([onehot_X[i,:],]*int(sum(onehot_Y[i,:])))
    new_y = [1]*int(onehot_Y[i,0]) + [0] * int(onehot_Y[i,1])

    #Add to the overall X and y
    oh_X[last+1:last+1+int(sum(onehot_Y[i,:])),:]=new_X
    oh_y += new_y
    last+=int(sum(onehot_Y[i,:]))
  return oh_X, oh_y

# Simulate data
np.random.seed(0)
fake_X = np.round(np.random.uniform(0,1,(50,3)))  # 3 features
fake_y = np.round(np.random.uniform(10,90,(50,2)))    # 2 columns, no. of successes and failures
x, y = expand_mat(fake_X, fake_y)
lr = LogisticRegression(random_state=0).fit(x, y)
lr.coef_

>>>array([[ 0.08026219, -0.07601091,  0.05658203]])

However if I change the number of successes and failures but keeping the ratio, the coefficients and the model change. I am not really sure why. I feel like as long as the ratio stays the same the model should also stay the same. I wonder what I am missing. Thanks

fake_y /= 2
fake_y = np.round(fake_y)
x, y = expand_mat(fake_X, fake_y)
lr = LogisticRegression(random_state=0).fit(x, y)
lr.coef_

>>>array([[ 0.08292701, -0.08024205,  0.05879753]])
````
  • See https://stats.stackexchange.com/a/493749/919 for one explanation and https://stats.stackexchange.com/a/31571/919 for another. – whuber Mar 20 '21 at 16:25

0 Answers0