2

I have the neuropsychiatric questionnaire scores of 300 individuals, of which 200 are normal, and 100 have the disease. The questionnaire is divided into 12 categories (delusion, agitation, ... etc). And each category begins with a question along the lines of: does the patient experience delusions? (yes=1, no=0, a binary variable). If yes, the person proceeds to specify the severity (1: Mild; 2: Moderate; 3: Marked) and frequency (1: Occasionally; 2: Often; 3: Frequently; 4: Very frequently), and these are ordinal Likert scale variables.

To start off I have some trouble with organizing the data.frame. I want the binary variable, status, to be the response variable (0 = normal, 1 = disease) of my ordinal logit model. Below is how I envision part of my data.frame to look like (not sure if this is the best way to organize the data)

    status  delusion   severity   frequency  agitation  severity  frequency    ...
    0       0          NA         NA         0          NA        NA           ...
    0       1          2          1          0          NA        NA           ...
    1       0          NA         NA         1          1         4            ...
    0       0          NA         NA         0          NA        NA           ...
    1       0          NA         NA         0          NA        NA           ...
    .       .          .          .          .          .         .             .
    .       .          .          .          .          .         .             .
    .       .          .          .          .          .         .             .

What should I do with the NAs? Should I just code them as 0? Also, most (90%) of the participants do not experience any of the 12 symptoms (so most of my data.frame will consist of NAs). So would using a logit model be ideal here? I'm interested in seeing how each of the 12 symptoms (at what frequency and severity) contribute to the disease status of the participant.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Adrian
  • 1,665
  • 3
  • 22
  • 42
  • 1
    You say you are predicting `status` as the response variable, but this is *binary*. You would use an *ordinal* logit model if your response variable is *ordinal*. Are you actually predicting severity? – gregmacfarlane Aug 04 '14 at 20:03
  • @gmacfarlane I see. I still want status to be my response variable. So I guess my new question is can I treat the ordinal predictors as continuous variables in my logistic regression model? – Adrian Aug 04 '14 at 20:09
  • 1
    I can't say what you want to model. But my advice is to reconsider what you are trying to show with this model; are you trying to show that people with more severe or frequent symptoms are likely to receive a diagnosis (`status ~ severity + frequency` in binary logit)? Or that people with a diagnosis show more severe symptoms (`severity ~ status` in ordinal/multinomial logit)? You have to answer that question yourself! – gregmacfarlane Aug 04 '14 at 20:13
  • 1
    Your response variable is binary, so just logistic regression is applicable. Some of your predictor variables are ordinal and just make them as factor and use one of level as a reference like zero. Here the NAs do not denote missining values, but normal. I would think it's ok to mark them as zero if there're no other values use zero as valid observed values. But be careful to differentiate normal and true missing values. – David Z Aug 04 '14 at 20:14
  • You can treat the levels as factors, like @David Z suggested. It is also standard to treat Likert levels as continuous variables, if they are indeed ordinal. – gregmacfarlane Aug 04 '14 at 20:17
  • And if you follow @David Z's advice you can throw out the binary variables such as 'delusion' which will be redundant. I wouldn't be too quick to treat the ordinal ones as if continuous, though. These are rating variables, not Likert scales. True Likert scales are continuous. http://stats.stackexchange.com/questions/10382/general-advice-on-forming-an-index-of-attitude-to-the-environment-from-a-set-of – rolando2 Aug 05 '14 at 01:38

0 Answers0