3

Thanks for looking at this, I've been tearing my hair out for a day or so now.

I have done a multiple variable logistic regression in R, and obtained my coefficients. I am able to make predictions for the training data in R without problem. But now I would like to create a prediction model in Ruby (that was the original point of doing the regression) and I'm having some trouble.

Basically, my equation is:

predicted_logit = K + v1*c1 + v2*c2 + ... vn*cn
odds_ratio = e^predicted_logit/(1+e^predicted_logit)

But it always seems to either give 1.0 or 0.0! The output of predict() in R is generally something nice and soft like 0.5578460!

I realize not everyone knows Ruby, but I'll include my code here for reference:

# These are the coefficients that R gives me from my logistic regression:
intercept = 0.2700309

coefficients = {
  high: 1.0136028, 
  low: 1.0016712, 
  germ_mean: 1.0233327,
  gdds: 0.9990283,
  early_gdds: 0.9986464,
  mid_gdds: 1.0002979,
  late_gdds: 0
}

# And this is what R predicts for one datum:
#
#   outcome high low germ_mean gdds early_gdds mid_gdds late_gdds p_success
# 1       1   73  28        40  119          0       91        28 0.5578460
# ...

# So to get my own p_success, first I multiply each coefficient by it's input data
period = {:high=>73, :low=>28, :germ_mean=>40, :gdds=>119, :early_gdds=>0, :mid_gdds=>91, :late_gdds=>28}
products = coefficients.map {|name,value| period[name]*value }

# Then I add those together and add that to the intercept
predicted_logit = intercept + products.sum

# Then my probability should be e^predicted_logit over 1 + e^predicted_logit:
odds_ratio = Math.exp(predicted_logit) / (1 + Math.exp(predicted_logit))

# But the odds ratio comes out as 1.0, not 0.5578460 like R predicts.

Edit Thanks to everyone who helped out! Turns out I had done exp(coef(period_logit)) to get the coefficients instead of coef(period_logit)! It's good to understand what you type before you hit enter.

  • Perhaps R is normalizing the features you're using by inputs. Typically the mean is subtracted then the value is normalized by the standard deviation. Check the variable you imputed for late_gdds. The code looks alright though. I'm not sure how Ruby treats integer-float operations. Usually logistic regression is written as $\frac{1}{1 + e^{-t}}$. – Jessica Collins Jan 17 '14 at 00:55
  • The late_gdds value was just "NA" in R. I assumed that meant there wasn't enough significance for that term? – Erik Pukinskis Jan 17 '14 at 02:40
  • @ErikPukinskis: I'm willing to bet the data that you used to train the logistic model is not on the same scale as the example you've posted. Perhaps you've done some sort of normalisation/applied some sort of transformation(ex: z-score transformation, log-transformation etc...) that you are not applying in Ruby. The process you have is correct. Perhaps you can post output from R of summary() –  Jan 17 '14 at 04:15
  • Just realised something, are you sure all coeffs have a positive sign? –  Jan 17 '14 at 04:31
  • Ah, thanks to all who commented. I had done exp(coef(period_logit)) to get the coefficients instead of just coef(period_logit)! – Erik Pukinskis Jan 17 '14 at 07:35
  • @ErikPukinskis The math checks out. That function is correct and should return 1. The coefficients though make me think that your variables are normalized first. – Jessica Collins Jan 18 '14 at 20:49

0 Answers0