I have three non-binary, discrete variables X0.1, X0.12 and X0.15. I want to model X0.1 as a MARS model of X0.12 and X0.15, then using the model answer queries of the form P(X0.1=x|X0.12=y, X0.15=z). This is what I have tried so far :
> mars1 <- earth (X0.1 ~ X0.15 + X0.12, data=d)
> mars1$coefficients
X0.1
(Intercept) 1.2392880
h(1-X0.12) -0.8291468
h(X0.15-1) -0.3891442
> predict(mars1, data.frame(X0.15=0, X0.12=1), type="response")
X0.1
[1,] 1.239288
This is clearly not a probability. I have also tried converting my variables into factors.
> a <- factor(d$X0.1)
> b <- factor(d$X0.12)
> c <- factor(d$X0.15)
> mars1 <- earth (a~b+c , glm=list(family=poisson), data=d)
> mars1$coefficients
0 1 2
(Intercept) 0.7890499 0.013295241 0.1976548
b1 -0.4273462 0.001813694 0.4255325
b2 -0.3652277 0.091872210 0.2733555
c2 0.1230729 0.105687002 -0.2287599
predict(mars1, data.frame(b=factor("2"),c=factor("2"), levels=levels(c)), type="response")
0 1 2
[1,] 0.5348205 0.3321178 0.2452333
[2,] 0.5348205 0.3321178 0.2452333
[3,] 0.5348205 0.3321178 0.2452333
The row sums of the prediction are not 1, so these can't be the probabilities either. What exactly do these numbers mean?
Where am I going wrong? Is MARS the wrong tool for this? I want a multinomial logistic regression setting such that interaction terms are automatically detected (I intend to run this on bigger sets of variables).