2

I have a model with an ordinal DV and a few IVs that are categorical (nominal and ordinal) as well as one continuous variable. I recoded all the categorical variables with 3 or more categories into dummies to run the colinearity test. I have one variable (5-point likert scale, ordinal) that showed 2 of the 4 categories with VF>10. DO you know which is the right way to proceed? Should I erase the whole variable, just one category (randomly...)? I am using SPSS.

mdewey
  • 16,541
  • 22
  • 30
  • 57
Irene
  • 51
  • 2

3 Answers3

1

Further information given in comments by the OP suggest that the problem here is separation or quasi-separation since 85+% of the cells formed by a complete cross-classification are zeroes.

To answer the original question posed first: the finding of collinearity in the model is not necessarily a red flag as it is sometimes treated. It may not even be an orange alert either. It is conveying important information about the data-set which needs to be looked at before further interpretation is made. This task would certainly need to be undertaken by someone knowing the scientific question and the background to the data-set, information which we do not have.

Separation is a topic which has been handed elsewhere on this site and fortunately there is an excellent answer in this Q&A How to deal with perfect separation in logistic regression? (in my opinion the highest voted answer, not the accepted one is the one to go for if you are short of time to read them all).

mdewey
  • 16,541
  • 22
  • 30
  • 57
0

you can not delete categories randomly- you need to investigate further as which two variables are highly correlated or inserting duplicate information to your model.. then you can proceed with removing the variable that has the highest correlations with all the other variables...

dimension reduction methods such as PCA can be other options..but it all depends on the type of data you are dealing with

RomRom
  • 343
  • 1
  • 5
0

Since these categories are coming from single variable, just combine those highly correlated levels into single one. say there are levels 'A' and 'B'. Then just create 3rd variable if 'A' or 'B' then 'C'. Now remove 'A' and 'B' and use 'C' in your model.

muni
  • 374
  • 3
  • 8
  • I am introducing the ordinal variable (5-Likert) as a covariate in the ordinal model, as if I do it as factor, I have a warning of quasi-complete separation and I do not know how to deal with it. Do you think is better let the variable in the model even with the multicollinearity or maybe change it to a 3 categories (dislike, neither, like)? When I introduce it as a factor I do not get the same results as a covariate. – Irene Aug 30 '16 at 11:05
  • Can you please post some results, in order for me to suggest you better. As for quasi-separation is considered, I think there might be some class(es) which are present only in one of the categories of target variable. – muni Aug 30 '16 at 12:19
  • I have a independent variable 5-point Likert item, from strongly dislike to strongly like. The categories of 'neither like or dislike' and 'like', when recoding the variable in 4 categories to study the collinearity (VF>10). If I ignore it, and I introduce the variable in the model as continuous, I have it is not significant, and a few variables are. If I introduce it as a continuous, it is significant, but some others that before were significant, are not anymore. – Irene Aug 31 '16 at 09:12
  • I have another videogame, another file of data, with the same dependent variables (likelihood to buy the videogame). The results are different as the videogame is also different. I did not have problems of multicollinearity, so I run the model with the ordinal variables as continuous. The problem if I introduce the ordinal variables as factor is that I have a warning: – Irene Aug 31 '16 at 09:15
  • There are 960 (85,7%) cells (i.e., dependent variable levels by observed combinations of predictor variable values) with zero frequencies. Unexpected singularities in the Fisher Information matrix are encountered. There may be a quasi-complete separation in the data. Some parameter estimates will tend to infinity. The PLUM procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain. – Irene Aug 31 '16 at 09:15
  • I am still not quite sure, what you are trying to achieve. So if I understand correctly, you have a game x and you want to find the likelihood of person purchasing that game. Now if the ordinal variable has some 5 categories, you can use as continuous, until or unless, you want to find out the insights for individual categories or the event rate is not increasing linearily as we go up or down the order on the scale 1-5. – muni Aug 31 '16 at 13:24
  • I do want to find the likelihood of buying a videogame (7-point likert item) based on the gender in box art cover and its stereotype. I run an ANOVA 2x2 as I have an experimental design 2x2, but I also run an ordinal regression to take into account the order of the dependent variable. When I introduce more variables to the model, one of them, an ordinal variable 5-point likert item (that measured the appealing of the genre of the videogame) gave me multicollinearity in two of its categories, as I recoded it as dummy for the collinearity test. The categories are "neither like or dislike=1" (1) – Irene Aug 31 '16 at 13:55
  • and "others=0" and the other category is "like"=1 and "others"=0. Both show VF>10. How should I introduce the ordinal variable in the model when two of its categories show multicollinearity? (2) – Irene Aug 31 '16 at 13:56
  • ok, what is your like variable- >4,5 scale? And is "neither like or dislike" is ->3 scale? – muni Aug 31 '16 at 14:34
  • The ordinal variables was: 1-strongly dislike; 2-dislike: 3- neither like or dislike; 4- like; 5- strongly like. I tested multicollinearity in linear regression by introducing this variable as 4 dummies. The first one was: 1=strongly dislike; 0=the rest. The second one was: 1=dislike; 0=rest. The third was: 1-neither like or dislike; 0=the rest. And last categories was: 1-like; 0= rest. I have multicollinearity with the third and 4 dummy variables – Irene Aug 31 '16 at 14:43
  • Ok, do this, club 1,2 into a single category, 3 as another category and 4,5 into another category. Now you can make 2 dummy categories out of this. Just check how the VIF for those 2 cases is. – muni Aug 31 '16 at 14:50
  • I did, the VF now are less than 10 :). Is there any way to explain this choice? Furthermore, do you know how to explain that I prefer to introduce the ordinal variables that I have in my model as a covariates? Instead of factors. If I do as factors I get the quasi-complete separation of data, so I prefer to consider them as covariates. – Irene Aug 31 '16 at 16:24
  • what is the VIF value that you are getting. The reason for such a choice is to club similar classes into one, as we assume, there will not be really much difference between them. Also have you tried using the ordinal variable as continuous in your data? – muni Sep 01 '16 at 08:03