What is the test statistics used for a conditional inference regression tree?

Question

In Hothorn et al, the test statistic is specified as

$$ T_j(L_n, w) = vec(\sum w_i g_j(X_{ji}) h(Y_i, (Y_1,...,Y_n)^T))$$

What is the exact form of this test statistic with a continuous response and categories and numerical predictors?

Achim Zeileis · Accepted Answer · 2015-03-31T12:14:11.503

9

If both the regressor $X_{ji}$ and the response $Y_i$ are numeric, then both $g(\cdot)$ and $h(\cdot)$ are chosen to be the identiy by default. Thus, the linear test statistic $T_j$ is simply the sum of products $X_{ji} \cdot Y_i$. This corresponds essentially to the main ingredient of a covariance or correlation - and with the subsequent standardization of the linear test statistic $T_j$ it becomes a correlation test statistic.

If one of the variables is categorical, then the corresponding transformation ($g(\cdot)$ or $h(\cdot)$) is the matrix of all dummy variables. Consequently, the standardized test statistic for two categorical variables corresponds to a $\chi^2$ test statistic. And if one variable is numeric and the other categorical you obtain an ANOVA-type test. Other transformations are also possible, appropriate for censored survival responses or ordinal responses etc.

If you want to carry out the tests "by hand" you can explore the independence_test() function from the coin package for conditional inference. An introduction is available in Hothorn et al.'s "A Lego System for Conditional Inference" (doi:10.1198/000313006X118430), a preprint version of which is also available in the package as vignette("LegoCondInf", package = "coin").

edited Mar 31 '15 at 12:14

answered Mar 31 '15 at 07:14

Achim Zeileis

13,510
1
29
53

The reference seems very nice (+1) but... Taylor & Francis...! Is there much difference with (http://cran.r-project.org/web/packages/coin/vignettes/LegoCondInf.pdf)[the vignette of the `coin` package] ? – Elvis Mar 31 '15 at 07:22
2

Thanks for the pointer! No, the differences are not substantial. I've edited my reply and added a pointer to the vignette. – Achim Zeileis Mar 31 '15 at 12:15
@AchimZeileis, there are data assumptions for different tests, e.g. correlation, chi-square, ANOVA, etc; are those assumptions examined before a test is applied? – blueskyddd Jul 04 '20 at 16:59
The test holds under fairly general conditions, e.g., exchangability. This in cross section data you should typically be fine. In any case, no checks for the assumptions are carried out. – Achim Zeileis Jul 04 '20 at 17:03
@AchimZeileis, thank you so much for the quick response! what do you mean by `This in cross section data`? – blueskyddd Jul 04 '20 at 17:14
This -> Thus (auto correct) – Achim Zeileis Jul 04 '20 at 17:32

What is the test statistics used for a conditional inference regression tree?

1 Answers1

Linked