count data as a dependent variable consist of five levels likert scale

Question

I have few Likert type questionnaires (items) as responses recorded from 1 to 5. I am using these responses as a dependent variable against three independent variables( one contentious and two others are categorical). I am trying to fit gzlm into these data . I tried to use one item of the questionnaire at a time. I have my responses in one column. I tried to use the cut command to the dependent variable so that it gives me five boundaries (the first will be treated as reference and the rst four will be compared to it) and when I apply the model

glm(y ~ IV1*IV2*IV3, data=young, family=poisson)

I keep getting errors like

**Error: 'family' argument seems not to be a valid family object.**

I need some direction if some knows please.

score 3 · Answer 1 · edited Mar 16 '15 at 10:08

3

Why do you think a 5 point Likert-scale item is a count variable? I don't know your data but a 5-point Likert scale item does not sound like a variable that is a count variable with a Poisson distribution (Poisson Distribution) and thus you don't want to specify family=poisson.

If your DV is a 5-point Likert scale (ordered categorical variable), you can either use linear models (e.g., multiple linear regression) or logistic models (ordered logistic regression, for example by using the polr function of the R package MASS).

In my opinion, you should use ordered logistic regression instead of linear regression if there are floor or ceiling effects in your DV, that is, if most people selected the highest or lowest category of the DV. You can check floor and ceiling effects by looking at a histogram of the DV and by looking at the mean score of the DV (A mean lower than 2 or higher than 4 could indicate a floor effect and ceiling effect, respectively).

A consequence of floor and ceiling effects is that linear models underestimate the regression coefficients unless the independent variable(s) shows the same floor and ceiling effect, respectively. That's why equally difficult items of a scale are often higher correlated than unequally difficult items (McDonald & Ahlawat, 1974).

edited Mar 16 '15 at 10:08

Nick Cox

48,377
8
110
156

answered Mar 16 '15 at 05:45

Michael Grosz

51
7

1

I agree with the advice to look at ordered logit here; note that there are many other models for ordered responses, as modern categorical data analysis texts explain. This is a helpful answer, but I would suggest revising it to be even firmer on the limitations of linear regression here. In my view floor and ceiling effects are not the main reason for avoiding it; you could start with the fact that it treats a Likert scale as if it were a measurement. – Nick Cox Mar 16 '15 at 10:12
@NickCox: In my opinion and experience so far, floor and ceiling effects are the main danger when using a Likert-scale with 5 or more answer catgories as DV in a linear model. If the answers on the DV do not show floor and ceiling effects, the differences between linear models and more appropriate models for ordered DVs are minimal. Did you make different experiences? – Michael Grosz Mar 17 '15 at 21:09
I can't really answer your question in your terms. As I view applying standard regression to Likert-scale responses as unsound, I don't do it and can't report on how bad the results are with a technique I don't use. It's a bit like asking about my experience with fire: I believe it to be dangerous and avoid it. – Nick Cox Mar 17 '15 at 21:17

Tim · Answer 2 · 2021-07-20T05:33:24.107

There are two common approaches for this kind pf problems: (a) threat the Likert data as Normal and use a linear regression, (b) use logistic regression model, in particular an Item Response Theory-based one.

As @MichaelGrosz noticed, Likert data is not Poisson distributed, as your error message suggests, so using a Poisson model is wrong in here. It was also noticed that linear regression has its pitfalls with this kind of problems while being used very often in social sciences for this kind of problems. However the most up-to-date approach would be to use IRT-based model (see here and here for examples and further details). The simple case of such model would be Rasch model

$$ P(X_{ij} = 1) = \frac{\exp(\theta_i - \beta_j)}{1+\exp(\theta_i - \beta_j)} $$

that is a variation of logistic regression model that models individual response $Y_{ij}$ as a function of the latent trait $\theta_i$ and the item "difficulty" (think of it as a measure of how often people in one range of the latent trait answer the item accordingly comparing to the ones on the opposite side of the continuum) $\beta_j$. Simple Rasch model is used for binary-valued items but there are also models for polytomous items that follow similar logic e.g. Graded Response model (examples e.g. here). In R there are several packages for estimating IRT models, you can check ltm (Rizopoulos, 2006) and mirt (Chalmers, 2012) packages and their documentation for further information.

Thanks all, but if I want to initially find the significant statistical difference (p value) between the responses of an item, can I still use Generalized linear model or not. — Ebby, Mar 16 '15 at 12:20
@Ebby You want to look for differences in what exactly? For estimating differences probably $\chi^2$ test will be enough. — Tim, Mar 16 '15 at 13:04
Not really Tim I just want to look for value of the significant (p value ) for the responses where response 1 (strongly disagree which is the reference level ) and each of the other four responses for one item(questionnaire). remember each of my questionnaire (item) target different factors, but of one issue. so once I get the statistical significant p value for these five responses above of the first item , I can, I guess do the same to the other items. I hope I am clear now and I hope you can suggest how to go about it.Thanks — Ebby, Mar 16 '15 at 14:53
I am not sure if I understand you correctly so it probably would be better if you edited your initial question to clarify the problem. However, if I understand you correctly and it is *category1 vs. the rest* problem then you can use simply logistic regression. — Tim, Mar 16 '15 at 19:25

count data as a dependent variable consist of five levels likert scale

2 Answers2

Linked