-
Notifications
You must be signed in to change notification settings - Fork 2
05 Step One
As described earlier, step 1 can be done via differents ways
Following example used sepsis data created in previous part.
To perform simple random forest on VT.object
, randomForest
, caret
and party
package can be used.
Class vt.forest("one", ...)
is used. It takes in arguments :
-
forest.type
: you have to set it to"one"
-
vt.data
: return ofvt.data()
function -
model
: a random forest model -
interactions
: logical,TRUE
is default value -
...
: options torandomForest()
function
with randomForest
# use randomForest::randomForest()
library(randomForest, verbose = F)
# Reproducibility
set.seed(123)
# Fit rf model
# default params
# set interactions to TRUE if using interaction between T and X
model.rf <- randomForest(x = vt.o$getX(interactions = T),
y = vt.o$getY(),
ntree = 500)
# initialize VT.forest.one
vt.f.rf <- vt.forest("one", vt.data = vt.o, model = model.rf, interactions = T)
### or you can use randomForest inside vt.forest()
vt.f.rf <- vt.forest("one", vt.data = vt.o, interactions = T, ntree = 500)
with party
cforest()
can be usefull however computing time is really long. I think there is an issue when giving cforest object in Reference Class parameter. Need to fix it.
# # use randomForest::randomForest()
# library(party, verbose = F)
# # Reproducibility
# set.seed(123)
# # Fit cforest model
# # default params
# # set interactions to TRUE if using interaction between T and X
# model.cf <- cforest(formula = vt.o$getFormula(), data = vt.o$getData(interactions = T))
# # initialize VT.forest.one
# vt.f.cf <- vt.forest("one", vt.data = vt.o, model = model.cf)
with caret
Using caret
can be usefull to deal with parallel computing for example.
NOTE: For caret
levels of outcome can't be 0, so i'll change levels name into "n"/"y"
# Copy new object
vt.o.tr <- vt.o$copy()
# Change levels
tmp <- ifelse(vt.o.tr$data$survival == 1, "y", "n")
vt.o.tr$data$survival <- as.factor(tmp)
rm(tmp)
# Check new data to be sure
formatRCTDataset(vt.o.tr$data, "survival", "THERAPY")
## "y" will be the favorable outcome
# use caret::train()
library(caret, verbose = F)
# Reproducibility
set.seed(123)
# fit train model
fitControl <- trainControl(classProbs = T, method = "none")
model.tr <- train(x = vt.o.tr$getX(interactions = T),
y = vt.o.tr$getY(),
method = "rf",
tuneGrid = data.frame(mtry = 5),
trControl = fitControl)
# initialize VT.forest.one
vt.f.tr <- vt.forest("one", vt.o.tr, model = model.tr)
To perform double random forest on VT.object
, same packages as simple random forest can be used.
Function vt.forest("double", ...)
is used. It takes in arguments :
-
forest.type
: You have to set is to"double"
-
vt.data
: return ofvt.data()
function -
model_trt1
: a random forest model for$T=1$ (this argument has to be specified) -
model_trt0
: a random forest model for$T=0$ (this argument has to be specified)
NOTE: use trt
parameter in VT.object::getX()
or VT.object::getY()
methods to obtain part of data depending on treatment. See following example.
with randomForest
# grow RF for T = 1
model.rf.trt1 <- randomForest(x = vt.o$getX(trt = 1),
y = vt.o$getY(trt = 1))
# grow RF for T = 0
model.rf.trt0 <- randomForest(x = vt.o$getX(trt = 0),
y = vt.o$getY(trt = 0))
# initialize VT.forest.double()
vt.doublef.rf <- vt.forest("double",
vt.data = vt.o,
model_trt1 = model.rf.trt1,
model_trt0 = model.rf.trt0)
### Or you can use randomForest() inside
vt.doublef.rf <- vt.forest("double",
vt.data = vt.o,
ntree = 200)
Follow the same structure for caret
or cforest
models.
This idea is taken from method 3 of Jared Foster paper :
A modification of [previous methods] is to obtain
$\hat{P_{1i}}$ and$\hat{P_{0i}}$ via cross-validation. In this méthod the specific data for subject$i$ is not used to obtain$\hat{P_{1i}}$ and$\hat{P_{0i}}$ . Using k-fold cross-validation, we apply random forest regression approach to$\frac{k-1}{k}$ of the data and use the resulting predictor to obtain estimates of$P_{1i}$ and$P_{0i}$ for the remaining$\frac{1}{k}$ of the observations. This is repeated$k$ times.
To use this approach, use vt.forest("fold", ...)
. This class takes in argument :
-
forest.type
: it has to be set to"fold"
-
vt.data
: return ofvt.data()
function -
fold
: number of fold (e.g.$5$ ) -
ratio
: Control of sampsize balance.ratio
of$2$ means that there 2 times le highest level compared to the other. "Highest" means the level with larger observations. It's in test. -
interactions
: Logical. IfTRUE
, interactions between covariables and treatments are used.FALSE
otherwise. -
...
:randomForest()
function options
NOTE: This function use only randomForest
package.
# initialize k-fold RF
# you can use randomForest options
model.fold <- vt.forest("fold", vt.data = vt.o, fold = 5, ratio = 1, interactions = T, ntree = 200)
Random Forests are not the only models you can use to compute
Anyway, aVirtualTwins package can be used. To do so, you can use VT.difft()
class. It is important to note this the parent class of all "forests" classes. It takes in argument :
-
vt.object
: return ofvt.data()
function -
twin1
: estimate of$P(Y_{i} = 1 | T = T_{i})$ : meaning response probability under the correct treatment. -
twin2
: estimate of$P(Y_{i} = 1 | T = 1-T_{i})$ : meaning response probability under the other treatment. -
method
: absolute (default), relative or logit. See?VT.difft
for details.
# you get twin1 and twin2 by your own method
# here, i'll use random number between 0 and 1 :
twin1_random <- runif(470)
twin2_random <- runif(470)
# then you can initialize VT.difft class :
model.difft <- VT.difft(vt.o, twin1 = twin1_random, twin2 = twin2_random, "absolute")
# compute difference of twins :
model.difft$computeDifft()
# See results
head(model.difft$difft)
## [1] -0.03599908 -0.44271883 -0.25458624 -0.64201822 0.29347148 -0.02843780
# Graph :
# hist(model.difft$difft)
NOTE: Also, you can clone repository, write your own child class of VT.difft()
AND submit it !