1

I performed a logistic regression to my dataset which has 6 variables. I got output from R as the following:

enter image description here

I used the step() function in R to select the best model, and above is my best model. The AIC value is so high! Could someone help me to explain how well does my model fit based on this information on the figure? If additional information is needed, what do I need to calculate?

BTW, how to calculate $R^2$ for my model?

Thanks,

SecretAgentMan
  • 1,463
  • 10
  • 30
James Teng
  • 21
  • 1
  • 1
    Have you considered cross validation (no pun intended) or a confusion matrix? Cross Validation https://www.youtube.com/watch?v=sFO2ff-gTh0 https://www.youtube.com/watch?v=TIgfjmp-4BA Confusion Matrix Confusion matrix (https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/) Other Resources Some Validation Links: https://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533433/pdf/envhper00541-0275.pdf – SecretAgentMan Jul 12 '18 at 21:32
  • Please copy the text of your output. Our visually impaired users cannot read your question. – Matthew Drury Jul 12 '18 at 22:17

2 Answers2

4

The absolute AIC value is meaningless. AIC is calculated by different methods only up to a constant. Thus, AIC can only be compared between models, and models fitted by the same piece of software.

Better: assess the predictive power of your predicted probabilities in a holdout sample. Don't use accuracy, use scoring rules or similar.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
4

Could someone help me to explain how well is my model fit based on this information on the figure.

It depends on how you define what a good model fit is. You could mean whether the assumptions of the model are valid. It is hard to say from the figure. Various things could be wrong with the model. E.g.,

  1. Is it the right link function?
  2. Are there interactions which you did not include (you only used main effects by ~ .).
  3. Are the effects linear or are there non-linear effects for some of continuous predictors?

You likely need to look at text books if you have not worked much with GLMs before. You can check this question for GLM books.

The AIC value is so high!

If you mean high relative to the other models you have fitted then this is not surprising since step optimize the AIC if I remember correctly. You cannot judge the model fit from the AIC as Stephan Kolassa mentions. You can only use the value to compare with other models fitted on the same data.

BTW, how to calculate $R^2$ for my model?

There is no $R^2$ for GLMs but there are various suggested alternativs. See this wiki page.

  • 2
    +1. [Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?](https://stats.stackexchange.com/q/3559/1352) may be helpful for the OP. As may be [Diagnostics for logistic regression?](https://stats.stackexchange.com/q/45050/1352). – Stephan Kolassa Jul 12 '18 at 21:25