I want to run a regression where one of the explanatory variables is a "summary" (details below) of a set of questionnaire questions that are answered on a Likert scale (although there are several NAs).
I know that whether it is correct to treat Likert data as continuous has been debated before, I read these two articles (that were linked as response to a previous question):
https://www.theanalysisfactor.com/likert-scale-items-as-predictor-variables-in-regression/ https://www.statisticssolutions.com/can-an-ordinal-likert-scale-be-a-continuous-variable/
that seem to suggest that transforming the data in a new continuous variable should be ok, but i am unsure which kind of tranformation or model i should use.
From what I read, it seems that maybe I could just assign a value from 1 (dislike a lot) to 5 (like a lot) and sum all the results, so that I would have a "score" for every person that took the test. My issue with this approach is that several people did not respond to one or more items, so for example, for a test with 10 questions, if a person has only answered 2 questions but has checked "like a lot" in both, I would have a score of 10 out of 50, suggesting a low level of satisfaction which would contrast with the answers actually given. On the other hand I could only score the people who have completed all the test but would lose some information.
Another approach could be to create an index, where I add 0.5 for every "like a little", 1 for every "like a lot" and subtract 0.5 and 1 for respectively "dislike a little" and "dislike a lot" and then divide for the number of question answered to have an index that goes from -1 to +1. In this case, people that have 2 “like a lot” and 8 NAs would score the same as people who gave 10 “like a lot” answers, which is also potentially problematic.
Lastly, I could do an IRT analysis where the latent variable is satisfaction (in this case actually it would be propensity or aversion toward an activity). As I have not used this kind of analysis before, I am not sure what I should look at to make sure it is appropriate, but mostly I don’t want to overcomplicate if there is a simpler approach that works as well. My understanding is that the irt is useful because it gives a different importance to all the items, but how do I know if in this case it makes sense? Also, is it applicable to solve my missing data issue?
Thank you (I know this subject has been treated before, but I’d like someone to give me some feedback on my reasoning and possibly to point me towards issues i should consider in my decision).