Title: | Iteratively Reweighted Boosting for Robust Analysis |
---|---|
Description: | Fit a predictive model using iteratively reweighted boosting (IRBoost) to minimize robust loss functions within the CC-family (concave-convex). This constitutes an application of iteratively reweighted convex optimization (IRCO), where convex optimization is performed using the functional descent boosting algorithm. IRBoost assigns weights to facilitate outlier identification. Applications include robust generalized linear models and robust accelerated failure time models. Wang (2021) <doi:10.48550/arXiv.2101.07718>. |
Authors: | Zhu Wang [aut, cre] |
Maintainer: | Zhu Wang <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1-1.5 |
Built: | 2024-11-18 06:26:09 UTC |
Source: | https://github.com/cran/irboost |
generate random data for classification as in Long and Servedio (2010)
dataLS(ntr, ntu = ntr, nte, percon)
dataLS(ntr, ntu = ntr, nte, percon)
ntr |
number of training data |
ntu |
number of tuning data, default is the same as |
nte |
number of test data |
percon |
proportion of contamination, must between 0 and 1. If |
a list with elements xtr, xtu, xte, ytr, ytu, yte for predictors of disjoint training, tuning and test data, and response variable -1/1 of training, tuning and test data.
Zhu Wang
Maintainer: Zhu Wang [email protected]
P. Long and R. Servedio (2010), Random classification noise defeats all convex potential boosters, Machine Learning Journal, 78(3), 287–304.
dat <- dataLS(ntr=100, nte=100, percon=0)
dat <- dataLS(ntr=100, nte=100, percon=0)
Fit a predictive model with the iteratively reweighted convex optimization (IRCO) that minimizes the robust loss functions in the CC-family (concave-convex). The convex optimization is conducted by functional descent boosting algorithm in the R package xgboost. The iteratively reweighted boosting (IRBoost) algorithm reduces the weight of the observation that leads to a large loss; it also provides weights to help identify outliers. Applications include the robust generalized
linear models and extensions, where the mean is related to the predictors by boosting, and robust accelerated failure time models. irb.train
is an advanced interface for training an irboost model. The irboost
function is a simpler wrapper for irb.train
. See xgboost::xgb.train
.
irb.train( params = list(), data, z_init = NULL, cfun = "ccave", s = 1, delta = 0.1, iter = 10, nrounds = 100, del = 1e-10, trace = FALSE, ... )
irb.train( params = list(), data, z_init = NULL, cfun = "ccave", s = 1, delta = 0.1, iter = 10, nrounds = 100, del = 1e-10, trace = FALSE, ... )
params |
the list of parameters,
|
data |
training dataset. |
z_init |
vector of nobs with initial convex component values, must be non-negative with default values = weights if data has provided, otherwise z_init = vector of 1s |
cfun |
concave component of CC-family, can be |
s |
tuning parameter of |
delta |
a small positive number provided by user only if |
iter |
number of iteration in the IRCO algorithm |
nrounds |
boosting iterations within each IRCO iteration |
del |
convergency criteria in the IRCO algorithm, no relation to |
trace |
if |
... |
other arguments passing to |
An object with S3 class xgb.train
with the additional elments:
weight_update_log
a matrix of nobs
row by iter
column of observation weights in each iteration of the IRCO algorithm
weight_update
a vector of observation weights in the last IRCO iteration that produces the final model fit
loss_log
sum of loss value of the composite function in each IRCO iteration. Note, cfun
requires objective
non-negative in some cases. Thus care must be taken. For instance, with objective="reg:gamma"
, the loss value is defined by gamma-nloglik - (1+log(min(y))), where y=label. The second term is introduced such that the loss value is non-negative. In fact, gamma-nloglik=y/ypre + log(ypre) in the xgboost::xgb.train
, where ypre is the mean prediction value, can
be negative. It can be derived that for fixed y
, the minimum value of gamma-nloglik is achived at ypre=y, or 1+log(y). Thus, among all label
values, the minimum of gamma-nloglik is 1+log(min(y)).
Zhu Wang
Maintainer: Zhu Wang [email protected]
Wang, Zhu (2021), Unified Robust Boosting, arXiv eprint, https://arxiv.org/abs/2101.07718
# logistic boosting data(agaricus.train, package='xgboost') data(agaricus.test, package='xgboost') dtrain <- with(agaricus.train, xgboost::xgb.DMatrix(data, label = label)) dtest <- with(agaricus.test, xgboost::xgb.DMatrix(data, label = label)) watchlist <- list(train = dtrain, eval = dtest) # A simple irb.train example: param <- list(max_depth = 2, eta = 1, nthread = 2, objective = "binary:logitraw", eval_metric = "auc") bst <- xgboost::xgb.train(params=param, data=dtrain, nrounds = 2, watchlist=watchlist, verbose=2) bst <- irb.train(params=param, data=dtrain, nrounds = 2) summary(bst$weight_update) # a bug in xgboost::xgb.train #bst <- irb.train(params=param, data=dtrain, nrounds = 2, # watchlist=watchlist, trace=TRUE, verbose=2) # time-to-event analysis X <- matrix(1:5, ncol=1) # Associate ranged labels with the data matrix. # This example shows each kind of censored labels. # uncensored right left interval y_lower = c(10, 15, -Inf, 30, 100) y_upper = c(Inf, Inf, 20, 50, Inf) dtrain <- xgboost::xgb.DMatrix(data=X, label_lower_bound=y_lower, label_upper_bound=y_upper) param <- list(objective="survival:aft", aft_loss_distribution="normal", aft_loss_distribution_scale=1, max_depth=3, min_child_weight=0) watchlist <- list(train = dtrain) bst <- xgboost::xgb.train(params=param, data=dtrain, nrounds=15, watchlist=watchlist) predict(bst, dtrain) bst_cc <- irb.train(params=param, data=dtrain, nrounds=15, cfun="hcave", s=1.5, trace=TRUE, verbose=0) bst_cc$weight_update
# logistic boosting data(agaricus.train, package='xgboost') data(agaricus.test, package='xgboost') dtrain <- with(agaricus.train, xgboost::xgb.DMatrix(data, label = label)) dtest <- with(agaricus.test, xgboost::xgb.DMatrix(data, label = label)) watchlist <- list(train = dtrain, eval = dtest) # A simple irb.train example: param <- list(max_depth = 2, eta = 1, nthread = 2, objective = "binary:logitraw", eval_metric = "auc") bst <- xgboost::xgb.train(params=param, data=dtrain, nrounds = 2, watchlist=watchlist, verbose=2) bst <- irb.train(params=param, data=dtrain, nrounds = 2) summary(bst$weight_update) # a bug in xgboost::xgb.train #bst <- irb.train(params=param, data=dtrain, nrounds = 2, # watchlist=watchlist, trace=TRUE, verbose=2) # time-to-event analysis X <- matrix(1:5, ncol=1) # Associate ranged labels with the data matrix. # This example shows each kind of censored labels. # uncensored right left interval y_lower = c(10, 15, -Inf, 30, 100) y_upper = c(Inf, Inf, 20, 50, Inf) dtrain <- xgboost::xgb.DMatrix(data=X, label_lower_bound=y_lower, label_upper_bound=y_upper) param <- list(objective="survival:aft", aft_loss_distribution="normal", aft_loss_distribution_scale=1, max_depth=3, min_child_weight=0) watchlist <- list(train = dtrain) bst <- xgboost::xgb.train(params=param, data=dtrain, nrounds=15, watchlist=watchlist) predict(bst, dtrain) bst_cc <- irb.train(params=param, data=dtrain, nrounds=15, cfun="hcave", s=1.5, trace=TRUE, verbose=0) bst_cc$weight_update
Fit an accelerated failure time model with the iteratively reweighted convex optimization (IRCO) that minimizes the robust loss functions in the CC-family (concave-convex). The convex optimization is conducted by functional descent boosting algorithm in the R package xgboost. The iteratively reweighted boosting (IRBoost) algorithm reduces the weight of the observation that leads to a large loss; it also provides weights to help identify outliers. For time-to-event data, an accelerated failure time model (AFT
model) provides an alternative to the commonly used proportional hazards models. Note, function irboost_aft
was developed to facilitate a data input format used with function xgb.train
for objective=survival:aft
in package xgboost
. In other ojective functions, the input format is different with function xgboost
at the time.
irb.train_aft( params = list(), data, z_init = NULL, cfun = "ccave", s = 1, delta = 0.1, iter = 10, nrounds = 100, del = 1e-10, trace = FALSE, ... )
irb.train_aft( params = list(), data, z_init = NULL, cfun = "ccave", s = 1, delta = 0.1, iter = 10, nrounds = 100, del = 1e-10, trace = FALSE, ... )
params |
the list of parameters used in |
data |
training dataset. |
z_init |
vector of nobs with initial convex component values, must be non-negative with default values = weights if provided, otherwise z_init = vector of 1s |
cfun |
concave component of CC-family, can be |
s |
tuning parameter of |
delta |
a small positive number provided by user only if |
iter |
number of iteration in the IRCO algorithm |
nrounds |
boosting iterations in |
del |
convergency criteria in the IRCO algorithm, no relation to |
trace |
if |
... |
other arguments passing to |
An object of class xgb.Booster
with additional elements:
weight_update_log
a matrix of nobs
row by iter
column of observation weights in each iteration of the IRCO algorithm
weight_update
a vector of observation weights in the last IRCO iteration that produces the final model fit
loss_log
sum of loss value of the composite function cfun(survival_aft_distribution)
in each IRCO iteration
Zhu Wang
Maintainer: Zhu Wang [email protected]
Wang, Zhu (2021), Unified Robust Boosting, arXiv eprint, https://arxiv.org/abs/2101.07718
library("xgboost") X <- matrix(1:5, ncol=1) # Associate ranged labels with the data matrix. # This example shows each kind of censored labels. # uncensored right left interval y_lower = c(10, 15, -Inf, 30, 100) y_upper = c(Inf, Inf, 20, 50, Inf) dtrain <- xgb.DMatrix(data=X, label_lower_bound=y_lower, label_upper_bound=y_upper) params = list(objective="survival:aft", aft_loss_distribution="normal", aft_loss_distribution_scale=1, max_depth=3, min_child_weight= 0) watchlist <- list(train = dtrain) bst <- xgb.train(params, data=dtrain, nrounds=15, watchlist=watchlist) predict(bst, dtrain) bst_cc <- irb.train_aft(params, data=dtrain, nrounds=15, watchlist=watchlist, cfun="hcave", s=1.5, trace=TRUE, verbose=0) bst_cc$weight_update predict(bst_cc, dtrain)
library("xgboost") X <- matrix(1:5, ncol=1) # Associate ranged labels with the data matrix. # This example shows each kind of censored labels. # uncensored right left interval y_lower = c(10, 15, -Inf, 30, 100) y_upper = c(Inf, Inf, 20, 50, Inf) dtrain <- xgb.DMatrix(data=X, label_lower_bound=y_lower, label_upper_bound=y_upper) params = list(objective="survival:aft", aft_loss_distribution="normal", aft_loss_distribution_scale=1, max_depth=3, min_child_weight= 0) watchlist <- list(train = dtrain) bst <- xgb.train(params, data=dtrain, nrounds=15, watchlist=watchlist) predict(bst, dtrain) bst_cc <- irb.train_aft(params, data=dtrain, nrounds=15, watchlist=watchlist, cfun="hcave", s=1.5, trace=TRUE, verbose=0) bst_cc$weight_update predict(bst_cc, dtrain)
Fit a predictive model with the iteratively reweighted convex optimization (IRCO) that minimizes the robust loss functions in the CC-family (concave-convex). The convex optimization is conducted by functional descent boosting algorithm in the R package xgboost. The iteratively reweighted boosting (IRBoost) algorithm reduces the weight of the observation that leads to a large loss; it also provides weights to help identify outliers. Applications include the robust generalized linear models and extensions, where the mean is related to the predictors by boosting, and robust accelerated failure time models.
irboost( data, label, weights, params = list(), z_init = NULL, cfun = "ccave", s = 1, delta = 0.1, iter = 10, nrounds = 100, del = 1e-10, trace = FALSE, ... )
irboost( data, label, weights, params = list(), z_init = NULL, cfun = "ccave", s = 1, delta = 0.1, iter = 10, nrounds = 100, del = 1e-10, trace = FALSE, ... )
data |
input data, if |
label |
response variable. Quantitative for |
weights |
vector of nobs with non-negative weights |
params |
the list of parameters,
|
z_init |
vector of nobs with initial convex component values, must be non-negative with default values = weights if provided, otherwise z_init = vector of 1s |
cfun |
concave component of CC-family, can be |
s |
tuning parameter of |
delta |
a small positive number provided by user only if |
iter |
number of iteration in the IRCO algorithm |
nrounds |
boosting iterations within each IRCO iteration |
del |
convergency criteria in the IRCO algorithm, no relation to |
trace |
if |
... |
other arguments passing to |
An object with S3 class xgboost
with the additional elments:
weight_update_log
a matrix of nobs
row by iter
column of observation weights in each iteration of the IRCO algorithm
weight_update
a vector of observation weights in the last IRCO iteration that produces the final model fit
loss_log
sum of loss value of the composite function in each IRCO iteration. Note, cfun
requires objective
non-negative in some cases. Thus care must be taken. For instance, with objective="reg:gamma"
, the loss value is defined by gamma-nloglik - (1+log(min(y))), where y=label. The second term is introduced such that the loss value is non-negative. In fact, gamma-nloglik=y/ypre + log(ypre) in the xgboost
, where ypre is the mean prediction value, can
be negative. It can be derived that for fixed y
, the minimum value of gamma-nloglik is achived at ypre=y, or 1+log(y). Thus, among all label
values, the minimum of gamma-nloglik is 1+log(min(y)).
Zhu Wang
Maintainer: Zhu Wang [email protected]
Wang, Zhu (2021), Unified Robust Boosting, arXiv eprint, https://arxiv.org/abs/2101.07718
# regression, logistic regression, Poisson regression x <- matrix(rnorm(100*2),100,2) g2 <- sample(c(0,1),100,replace=TRUE) fit1 <- irboost(data=x, label=g2, cfun="acave",s=0.5, params=list(objective="reg:squarederror", max_depth=1), trace=TRUE, verbose=0, nrounds=50) fit2 <- irboost(data=x, label=g2, cfun="acave",s=0.5, params=list(objective="binary:logitraw", max_depth=1), trace=TRUE, verbose=0, nrounds=50) fit3 <- irboost(data=x, label=g2, cfun="acave",s=0.5, params=list(objective="binary:hinge", max_depth=1), trace=TRUE, verbose=0, nrounds=50) fit4 <- irboost(data=x, label=g2, cfun="acave",s=0.5, params=list(objective="count:poisson", max_depth=1), trace=TRUE, verbose=0, nrounds=50) # Gamma regression x <- matrix(rnorm(100*2),100,2) g2 <- sample(rgamma(100, 1)) library("xgboost") param <- list(objective="reg:gamma", max_depth=1) fit5 <- xgboost(data=x, label=g2, params=param, nrounds=50) fit6 <- irboost(data=x, label=g2, cfun="acave",s=5, params=param, trace=TRUE, verbose=0, nrounds=50) plot(predict(fit5, newdata=x), predict(fit6, newdata=x)) hist(fit6$weight_update) plot(fit6$loss_log) summary(fit6$weight_update) # Tweedie regression param <- list(objective="reg:tweedie", max_depth=1) fit6t <- irboost(data=x, label=g2, cfun="acave",s=5, params=param, trace=TRUE, verbose=0, nrounds=50) # Gamma vs Tweedie regression hist(fit6$weight_update) hist(fit6t$weight_update) plot(predict(fit6, newdata=x), predict(fit6t, newdata=x)) # multiclass classification in iris dataset: lb <- as.numeric(iris$Species)-1 num_class <- 3 set.seed(11) param <- list(objective="multi:softprob", max_depth=4, eta=0.5, nthread=2, subsample=0.5, num_class=num_class) fit7 <- irboost(data=as.matrix(iris[, -5]), label=lb, cfun="acave", s=50, params=param, trace=TRUE, verbose=0, nrounds=10) # predict for softmax returns num_class probability numbers per case: pred7 <- predict(fit7, newdata=as.matrix(iris[, -5])) # reshape it to a num_class-columns matrix pred7 <- matrix(pred7, ncol=num_class, byrow=TRUE) # convert the probabilities to softmax labels pred7_labels <- max.col(pred7) - 1 # classification error: 0! sum(pred7_labels != lb)/length(lb) table(lb, pred7_labels) hist(fit7$weight_update)
# regression, logistic regression, Poisson regression x <- matrix(rnorm(100*2),100,2) g2 <- sample(c(0,1),100,replace=TRUE) fit1 <- irboost(data=x, label=g2, cfun="acave",s=0.5, params=list(objective="reg:squarederror", max_depth=1), trace=TRUE, verbose=0, nrounds=50) fit2 <- irboost(data=x, label=g2, cfun="acave",s=0.5, params=list(objective="binary:logitraw", max_depth=1), trace=TRUE, verbose=0, nrounds=50) fit3 <- irboost(data=x, label=g2, cfun="acave",s=0.5, params=list(objective="binary:hinge", max_depth=1), trace=TRUE, verbose=0, nrounds=50) fit4 <- irboost(data=x, label=g2, cfun="acave",s=0.5, params=list(objective="count:poisson", max_depth=1), trace=TRUE, verbose=0, nrounds=50) # Gamma regression x <- matrix(rnorm(100*2),100,2) g2 <- sample(rgamma(100, 1)) library("xgboost") param <- list(objective="reg:gamma", max_depth=1) fit5 <- xgboost(data=x, label=g2, params=param, nrounds=50) fit6 <- irboost(data=x, label=g2, cfun="acave",s=5, params=param, trace=TRUE, verbose=0, nrounds=50) plot(predict(fit5, newdata=x), predict(fit6, newdata=x)) hist(fit6$weight_update) plot(fit6$loss_log) summary(fit6$weight_update) # Tweedie regression param <- list(objective="reg:tweedie", max_depth=1) fit6t <- irboost(data=x, label=g2, cfun="acave",s=5, params=param, trace=TRUE, verbose=0, nrounds=50) # Gamma vs Tweedie regression hist(fit6$weight_update) hist(fit6t$weight_update) plot(predict(fit6, newdata=x), predict(fit6t, newdata=x)) # multiclass classification in iris dataset: lb <- as.numeric(iris$Species)-1 num_class <- 3 set.seed(11) param <- list(objective="multi:softprob", max_depth=4, eta=0.5, nthread=2, subsample=0.5, num_class=num_class) fit7 <- irboost(data=as.matrix(iris[, -5]), label=lb, cfun="acave", s=50, params=param, trace=TRUE, verbose=0, nrounds=10) # predict for softmax returns num_class probability numbers per case: pred7 <- predict(fit7, newdata=as.matrix(iris[, -5])) # reshape it to a num_class-columns matrix pred7 <- matrix(pred7, ncol=num_class, byrow=TRUE) # convert the probabilities to softmax labels pred7_labels <- max.col(pred7) - 1 # classification error: 0! sum(pred7_labels != lb)/length(lb) table(lb, pred7_labels) hist(fit7$weight_update)