In my PhD thesis I am working on spatial modeling of different chemical parameters in groundwater, and for spatial modeling I am also using the multiple statistical approach. I have a question about multiple regression analysis. (Or it is better to use polynomial regression?)
The equation for spatial regression modeling is: $Y = α + β_1x_1 + β_2x_2 +.... + β_ix_i + ε$
For my dependent variable, I have concentrations of calcium in groundwater, which were measured from different sampling points in the entire research area. For the independent variable, I choose the spatial data that influence the distribution of calcium in groundwater. I have lithology, vegetation, slope, climatic conditions (temperature, precipitation), depth of soil, ...
The problem is that lithology and vegetation are categorical data (lithology = 3 categories from 1
to 3
, where 1
means clastic rocks, 2
= carbonate rocks and 3
= metamorphic and igneous rocks; and vegetation = 4 categories (1
= bare rocks, 2
= agriculture land, 3
= grassland, 4
= forests); all others variables are numerical and continuous.
Do you have any idea how to solve the problem with categorical data in multiple regression analysis? Might it be better to use some other method? Best regards and thank you very much for your help.