2

I had asked a general question about conditional inference trees via party a while back and gotten a great reply.

I am revisiting this procedure and trying to make sense of the linear statistic that is being used (Hothorn et al., Unbiased Recursive Partitioning: A Conditional Inference Framework, Research Report Series, 2004, page 4, equation (1)).

enter image description here

I am not at all clear how this statistic is calculated. Can anyone help?

Here is what it seems to me that if

  • $g(\cdot)$ and $h(\cdot)$ are the identity functions
  • and there is a predictor $x$ and a response $y$, both numeric, the statistic is simply a scalar. This is not what is shown, so I am wrong :)

Example data:

x<-c(1,3,4,67,32,23,3,12,4)
y<-c(43,23,45,22,12,465,6,54,3)
w<-rep(1,9)

T<-0
for (i in 1:length(y))
{

  T<-T+(w[i]*x[i]*y[i])  

}
T #13523

I THINK the result needs to be a vector of length 81.

B_Miner
  • 7,560
  • 20
  • 81
  • 144

1 Answers1

1

I want to answer my own question for completeness. I seems I was reading the p and q values incorrectly and in fact, I believe my trivial example to be correct: i.e that T does equal 13523.

I stumbled upon the vignette for the coin package and learned that this package contains the function independence_test that appears the same as the idea from ctree. The coin function exposes the expectation vector and covariance matrix shown above (Hothorn et al., Unbiased Recursive Partitioning: A Conditional Inference Framework, Research Report Series, 2004, page 4, equation (1)).

Here is the R code used to verify the calculation of 13523.

library(coin)
library(party)
library(Formula)

x<-c(1,3,4,67,32,23,3,12,4)
y<-c(43,23,45,22,12,465,6,54,3)
w<-rep(1,9)
num_test<-data.frame(cbind(x,y))


c<-independence_test(y~x, data = num_test,ytrafo = function(data) trafo(data, numeric_trafo = id_trafo), teststat="max")
show(c)

statistic(c,type = c("test")) #0.263028
expectation(c)
covariance(c)
variance(c)


T<-0
for (i in 1:length(y))
{

  T<-T+(w[i]*x[i]*y[i])  

}
T #13523


abs((T-expectation(c))/sqrt(variance(c))) #0.263028 matches above!

#now use ctree
xy_tree<-ctree(y~x, data = num_test, controls = ctree_control(teststat="max", mincriterion=0.001, minsplit=1))
nodes(xy_tree,1)[[1]][3] #statistic is #0.263028 matches above!

I also confirmed with an example using a factor as the independent variable (with dummy coding).

B_Miner
  • 7,560
  • 20
  • 81
  • 144