Skip to content

Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, : invalid number of intervals #186

@marianeira

Description

@marianeira

It seems that lime explanation does not work with variables with just NAs and constant value, which do fit the XGBOOST.

For instance, I have a variable that is highly correlated to the target, in fact, it is the variable with the highest gain within the importance of variables. Besides, if we replace missing values with an extreme value we obtain a correlation with the target of 0.77.

However, it does not work within LIME explanation because its deviation is zero (it does not consider missing values, unlike xgboost). Therefore I can't use the lime benefits with these types of variables. Is there any other solution rather than removing that type of columns, which seems to work well in XGBOOST?

Here, there is a simple example of the problem. Thanks in advance

df <- data.frame(target = c(0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2),
var1 = rnorm(22),
var2 = rnorm(22)*10,
var3 = c(rep(0,20),1,1),
var4 = c(-1,-2,5,3,1,2,2,1,1,2,1,-1,5,1,1,20,2,1,0,2,2,2),
var5 = c(NA,NA,NA,NA,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))

Train Xgboost

X_train <- df %>% select(-target)

dtrain <- xgb.DMatrix(data.matrix(X_train),
label = as.matrix(df$target))

boost <- xgb.train(data = dtrain,
list(max_depth = 7, eta = 0.1,
objective = "multi:softprob",
eval_metric = "error", nthread = 1),
num_class = 3,
nrounds = 100)
xgb.importance(feature_names = colnames(dtrain),
model = boost)

local_obs <- X_train[c(1,2),]

Fit Lime, quantile bins = FALSE

explainer1 <- lime(x=X_train,model=boost, quantile_bins = F)
Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, :
invalid number of intervals

explanations1 <- lime::explain(local_obs, explainer1, n_labels = 2, n_features = 2)
plot_explanations(explanations1)

Fit Lime, quantile bins = TRUE

explainer2 <- lime(x=X_train,model=boost, quantile_bins = T)
Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, :
invalid number of intervals
In addition: Warning messages:
1: var3 does not contain enough variance to use quantile binning. Using standard binning instead.
2: var5 does not contain enough variance to use quantile binning. Using standard binning instead.

explanations2 <- lime::explain(local_obs, explainer2, n_labels = 2, n_features = 2)
plot_explanations(explanations2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions