1

I have run this glm model

y~poly(xa,2)+poly(xb,2)+...

Then have found the best fitting model using AICc. The best fitting model has a subset of the x variables.

How do I show that the model fits well the data? With y and x it's easy but now I have y and xa, xb, etc. and these x variables are values on different scales. Do I just say this was the best performing models in my manuscript or do I have to present a graph?

Tim
  • 108,699
  • 20
  • 212
  • 390
Herman Toothrot
  • 249
  • 2
  • 10
  • 1
    How *exactly* was your model chosen? (This is your rationale for choosing this model so it is important and you want to describe this part.) How do you define "the best" model in this case? How do you define model performance? Answering this questions is crucial to what you are going to describe and how to visualize your results. – Tim Jan 27 '15 at 12:17
  • @Tim I have specified that I have compared models using AICc. Is that what you are asking? I picked the model with the lowest AIC and used a step function. – Herman Toothrot Jan 27 '15 at 16:43
  • Stepwise regression is *not* the best approach: http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/ or http://andrewgelman.com/2014/06/02/hate-stepwise-regression/ – Tim Jan 27 '15 at 16:56
  • 1
    @Tim thanks for the feedback, I have read that information but in Biology it seems that AICc and stepwise regression is accepted even in recent publications. – Herman Toothrot Jan 27 '15 at 17:52
  • 1
    In psychology it is accepted to treat single 5-point likert item as continuous and normally distributed... – Tim Jan 27 '15 at 18:18

1 Answers1

2

With using stepwise regression you could actually end up with an overfitting model so I wouldn't consider proving that it fits well your main problem. Stepwise regression is known as one that potentially gives biased results, is discouraged by many authors and even considered a "statistical sin".

With presenting your model you should describe why do you consider it being the best one - saying "because AIC said so" is not enough. Automatic model selection could lead you to choosing a poor model that "fits well" your data but is hard to interpret and/or has a poor predictive power. So there has to be some rationale behind your model so to say why this one is the best by your criteria, and what were the criteria. What is practical significance of your model?

You also seem to ask what else model diagnostic results should you present with describing your model choice. Actually there are at least few more things that could be done - you can read more on them here.

What you should present is model diagnostics, the procedure for model selection, and model selection criteria you used. If your main objective is to show that model fits the data well, then look at the residuals.

Tim
  • 108,699
  • 20
  • 212
  • 390