I have some data that I want to fit a log-linear model to (using R). The data in this dataset is categorical and y is the frequency. I use the glm function with family=poisson. I firstly fit the saturated model and check the residuals. I obtain
Clearly, cases 42 and 61 are outliers so should I remove them to continue with the analysis.
Next, I apply backwards elimination and obtain model 1. It has g^2=15.136 on 16 d.f. and p-value 0.514682 and Pearson's test statistic = 14.27873 on 16 d.f. with p-value 0.577. This model has an AIC of 386.3297 and BIC of 489.9561. RSS=2.491516
I then start with the empty model and apply forward selection and obtain model 2. It has g^2= 126.38 on 53 d.f. p-value 6.2327e-0.8 and Pearson's test statistic. =123.9027 on 53 d.f. p-value 1.31x10-07. It has an AIC of 423.5739 and BIC of 447.3217. RSS= 9.967527
Model 1 has higher order terms than model 2 (model 1 has a four way interaction term). Model two has two way interaction terms. Both models are nested.
I know that the different directions of stepwise regression can produce different results. I don't know which model to proceed with because the AIC of model 2 is higher than model 1, yet it has lower order interaction terms and very low p-value.
Any suggestions?