Skip to content
François Vieille edited this page Oct 9, 2016 · 1 revision

Step 2 : Estimate a Regression or Classification Tree

As described in the method, we define $Z_i = \hat{P_{1i}} - \hat{P_{0i}}$. It's the difference in term of response of the active treatments compared to the control treatment. The idea is to try to explain this difference by few covariables.

Classification

We define a new variable $Z^{}$, $Z^{}_i=1$ if $Z_i > c$ and $Z^{}_i=0$ otherwise. Classification tree's goal is to explain the value $Z^=1$. $c$ is a threshold given by the user. It's the threshold for which the difference is "interesting". One idea is to use quantiles of the difft distribution.

To compute a classifiction tree, vt.tree("class", ...) is used. Internally, rpart::rpart() is computed. It takes in argument:

  • tree.type : it has to be set to "class"
  • vt.difft : VT.difft object (return of vt.forest() function)
  • sens : c(">", "<"). sens corresponds to the way $Z^{*}$ is defined.
    • ">" (default) : $Z^{}$, $Z^{}_i=1$ if $Z_i &gt; c$ and $Z^{*}_i=0$ otherwise.
    • "<" : $Z^{}$, $Z^{}_i=1$ if $Z_i &lt; c$ and $Z^{*}_i=0$ otherwise.
  • threshold : corresponds to $c$, it can be a vector. $seq(.5, .8, .1)$ by default.
  • screening : NULL is default value. If TRUE only covariables in varimp vt.data 's field is used.

See ?VT.tree for details.

# initialize classification tree
tr.class <- vt.tree("class",
                    vt.difft = vt.f.rf,
                    sens = ">",
                    threshold = quantile(vt.f.rf$difft, seq(.5, .8, .1)),
                    maxdepth = 3,
                    cp = 0,
                    maxcompete = 2) 
# tr.class is a list if threshold is a vectoor
class(tr.class)
## [1] "list"
# acce trees with treeXX
class(tr.class$tree1)
## [1] "VT.tree.class"
## attr(,"package")
## [1] "aVirtualTwins"

Regression

Use regression tree to explain $Z$ by covariables $X$. Then some leafs have predicted $Z_i$ greater than the threshold $c$ (if $sens$ is ">"), and it defines which covariables explain $Z$.

The function to use is vt.tree("reg", ...). It takes same parameters than classification mehod.

# initialize regression tree
tr.reg <- vt.tree("reg",
                  vt.difft = vt.f.rf,
                  sens = ">",
                  threshold = quantile(vt.f.rf$difft, seq(.5, .8, .1)))
# tr.class is a list if threshold is a vectoor
class(tr.reg)
## [1] "list"
# access trees with treeXX
class(tr.reg$tree1)
## [1] "VT.tree.reg"
## attr(,"package")
## [1] "aVirtualTwins"

Subgroups and results

Once trees have been computed, you surely want to see what are the subgroups. This package provides a wrapper function of intern methods of VT.tree class : vt.subgroups().

This function takes in argument :

  • vt.tree : object or list of class VT.tree. Return of the vt.tree() function.
  • only.leaf : logical. Set TRUE (default) to visualize only terminal nodes.
  • only.fav : logical. Set TRUE (default) to visualize only class 1 nodes. ($\hat{A}$)
  • tables : logical. Set FALSE (default) to prevent tables of incidences from being printed.
  • verbose : logical. Set FALSE (default) to prevent detailed stuffs from being printed.
  • compete : logical. Set TRUE to print competitors rules thanks to competitors split. FALSE is default value.

If vt.tree is a list, unique subgroups are printed.

# use tr.class computed previously
vt.sbgrps <- vt.subgroups(tr.class)
# print tables with knitr package
library(knitr)
knitr::kable(vt.sbgrps)
       Subgroup                                       Subgroup size   Treatement event rate   Control event rate   Treatment sample size   Control sample size    RR (resub)   RR (snd)

tree1.11 PRAPACHE< 26.5 & AGE>=51.74 & BLGCS< 11.5 49 0.485 0.375 33 16 1.293 1.043 tree1.3 PRAPACHE>=26.5 157 0.752 0.327 105 52 2.300 1.865 tree2.11 PRAPACHE< 26.5 & BLGCS< 10.5 & AGE>=64.68 14 0.7 0.5 10 4 1.400 1.200 tree2.13 PRAPACHE>=26.5 & AGE< 51.74 & BLLCREAT< 2.3 21 0.375 0.6 16 5 0.625 1.506 tree2.7 PRAPACHE>=26.5 & AGE>=51.74 120 0.897 0.31 78 42 2.894 2.007 tree3.13 PRAPACHE>=26.5 & AGE< 51.74 & BLLBILI>=2.95 13 0.333 0.25 9 4 1.332 1.564 tree4 PRAPACHE>=26.5 & AGE>=51.74 & PRAPACHE>=28.5 89 0.915 0.333 59 30 2.748 2.087

You can plot one tree with package rpart.plot

If you want to see competitors split :

tr.class$tree2$createCompetitors()
head(tr.class$tree2$competitors)
##    count ncat    improve  index      var path          string
## 1    470   -1 119.987009 26.500 PRAPACHE    2 PRAPACHE < 26.5
## 2    470    1  37.018261 10.500    BLGCS    2   BLGCS >= 10.5
## 3    470   -1  30.491058 61.671      AGE    2    AGE < 61.671
## 9    313    1   6.349646 10.500    BLGCS    4   BLGCS >= 10.5
## 10   313   -1   5.075061 61.671      AGE    4    AGE < 61.671
## 11   313   -1   2.027870 19.500 PRAPACHE    4 PRAPACHE < 19.5

If you want to print incidence of a subgroup :

vt.o$getIncidences("PRAPACHE >= 26 & AGE >= 52")
## $table.selected
## $table.selected$table
##            trt
## resp        0     1   sum  
##   0         30    18  48   
##   1         15    72  87   
##   sum       45    90  135  
##   Incidence 0.333 0.8 0.644
## 
## $table.selected$rr
## [1] 2.402402
## 
## 
## $table.not.selected
## $table.not.selected$table
##            trt
## resp        0     1     sum  
##   0         71    170   241  
##   1         37    57    94   
##   sum       108   227   335  
##   Incidence 0.343 0.251 0.281
## 
## $table.not.selected$rr
## [1] 0.7317784
# or
# tr.class$tree2$getIncidences("PRAPACHE >= 26 & AGE >= 52")

If you want to get infos about the tree

tr.class$tree2$getInfos()
## 
## Threshold = 0.0675
## Delta = 0.0671
## Sens : >
## Size of Ahat : 155
# access Ahat
# tr.class$tree2$Ahat

You can re-run rpart computation:

tr.class$tree2$run(maxdepth = 2)

Type ?VT.tree for details.