I tried to find an answer to my question but I didn't find the right one; if you think there is already an answer please write the link.
I am using a national survey to study the investment in complementary pension in R: the original df is composed by several variables and around 50k observations from 2002 to 2014 (I used only the families that were interviwed at least two times (panel)).
year family comp sex study type_degree
1 2002 104002 1 2 2 NA
2 2002 107090 2 1 3 3
3 2002 111052 1 2 1 NA
4 2002 111052 3 2 2 NA
5 2002 11940 2 2 3 1
6 2002 11972 2 2 3 2
7 2002 121040 1 1 1 NA
8 2002 121040 2 2 2 NA
9 2002 136061 1 1 3 1
where comp is the component of the family (mother, father, son..), study is the education level (1 for low education level, 2 medium, 3 degree); type_degree (1 economics, 2 maths, 3 medicine...). Type of degree is present only if level study is 3 (if the individual has a degree), in the other cases is NA.
I factorized the variable using the factor command in this way:
df$study <- factor(df$study)
In this case I had no problems since I have a value for each observation (no NA). For the type_degree variable I did in this way:
df$type_degree <- ifelse(df$type_degree=="1",1,0)
where 1 is the value for the graduation in economics (I want to study if graduted in economics behave in a different way than other graduated). In this case I have also NA values because not all observations (individuals) have a degree; so I tried to managed NA using the na.action in the regression, like this:
eq <- lm(pip ~ sex + study + area + type_degree, data=df, na.action=na.exclude)
where pip is the complementary pension type, area is the living area in the country (nord, centre, south).
I factorized the variables but R signals the error contrasts can be applied only to factors with 2 or more levels
and I supposed it was determined by the fact the type_degree has also NA values, but now I don't know another way to manage NA.
Thank you in advance.