I have the neuropsychiatric questionnaire scores of 300 individuals, of which 200 are normal, and 100 have the disease. The questionnaire is divided into 12 categories (delusion, agitation, ... etc). And each category begins with a question along the lines of: does the patient experience delusions? (yes=1, no=0, a binary variable). If yes, the person proceeds to specify the severity (1: Mild; 2: Moderate; 3: Marked) and frequency (1: Occasionally; 2: Often; 3: Frequently; 4: Very frequently), and these are ordinal Likert scale variables.
To start off I have some trouble with organizing the data.frame. I want the binary variable, status, to be the response variable (0 = normal, 1 = disease) of my ordinal logit model. Below is how I envision part of my data.frame to look like (not sure if this is the best way to organize the data)
status delusion severity frequency agitation severity frequency ...
0 0 NA NA 0 NA NA ...
0 1 2 1 0 NA NA ...
1 0 NA NA 1 1 4 ...
0 0 NA NA 0 NA NA ...
1 0 NA NA 0 NA NA ...
. . . . . . . .
. . . . . . . .
. . . . . . . .
What should I do with the NAs? Should I just code them as 0? Also, most (90%) of the participants do not experience any of the 12 symptoms (so most of my data.frame will consist of NAs). So would using a logit model be ideal here? I'm interested in seeing how each of the 12 symptoms (at what frequency and severity) contribute to the disease status of the participant.