I am new to binomial regression so this question may seem very basic. So I have data that fits for binomial regression (each row has n
successes and m
failures). Since I am more familiar with logistic regression so I thought I would expand my matrix so that each row is either 1
for success or 0
for failure. My code is below
import numpy as np
from sklearn.linear_model import LogisticRegression
#Function to construct matrix compatible for logistic regression. Could be cleaner
def expand_mat(onehot_X, onehot_Y):
count_row = 0
for i in range(onehot_X.shape[0]):
count_row += int(sum(onehot_Y[i,:]))
oh_X = np.zeros((count_row,onehot_X.shape[1]))
oh_y = []
last = -1
for i in range(onehot_X.shape[0]):
if int(sum(onehot_Y[i,:]))<1:continue
new_X = np.array([onehot_X[i,:],]*int(sum(onehot_Y[i,:])))
new_y = [1]*int(onehot_Y[i,0]) + [0] * int(onehot_Y[i,1])
#Add to the overall X and y
oh_X[last+1:last+1+int(sum(onehot_Y[i,:])),:]=new_X
oh_y += new_y
last+=int(sum(onehot_Y[i,:]))
return oh_X, oh_y
# Simulate data
np.random.seed(0)
fake_X = np.round(np.random.uniform(0,1,(50,3))) # 3 features
fake_y = np.round(np.random.uniform(10,90,(50,2))) # 2 columns, no. of successes and failures
x, y = expand_mat(fake_X, fake_y)
lr = LogisticRegression(random_state=0).fit(x, y)
lr.coef_
>>>array([[ 0.08026219, -0.07601091, 0.05658203]])
However if I change the number of successes and failures but keeping the ratio, the coefficients and the model change. I am not really sure why. I feel like as long as the ratio stays the same the model should also stay the same. I wonder what I am missing. Thanks
fake_y /= 2
fake_y = np.round(fake_y)
x, y = expand_mat(fake_X, fake_y)
lr = LogisticRegression(random_state=0).fit(x, y)
lr.coef_
>>>array([[ 0.08292701, -0.08024205, 0.05879753]])
````