R/iucnn_feature_importance.R
iucnn_feature_importance.Rd
Uses a model generated with iucnn_train_model
to evaluate how much each feature or
group of features contributes to the accuracy of
the test set predictions. The function
implements the concept of permutation feature importance,
in which the values in a given
feature column of the test set are shuffled randomly
among all samples. Then the feature
data manipulated in this manner are used to predict
labels for the test set and the accuracy
is compared to that of the original feature data.
The difference (delta accuracy) can be
interpreted as a measure of how important a
given feature or group of features is for the
trained NN to make accurate predictions.
iucnn_feature_importance(
x,
feature_blocks = list(),
n_permutations = 100,
provide_indices = FALSE,
verbose = FALSE,
unlink_features_within_block = TRUE
)
iucnn_model object, as produced as output
when running iucnn_train_model
a list. Default behavior is to group the features into geographic, climatic, biome, and human footprint features. Provide custom list of feature names or indices to define other feature blocks. If feature indices are provided as in this example, turn provide_indices flag to TRUE.
an integer. Defines how many iterations of shuffling feature values and predicting the resulting accuracy are being executed. The mean and standard deviation of the delta accuracy are being summarized from these permutations.
logical. Set to TRUE if custom feature_blocks
are provided as indices. Default is FALSE.
logical. Set to TRUE to print screen output while calculating feature importance. Default is FALSE.
logical. If TRUE, the features within each defined block are shuffled independently. If FALSE, each feature column within a block is resorted in the same manner. Default is TRUE.
a data.frame with the relative importance of each feature block (see delta_acc_mean column).
By default this function groups the features into geographic, climatic, biome, and human footprint features and determines the importance of each of these blocks of features. The feature blocks can be manually defined using the feature_blocks argument.
See vignette("Approximate_IUCN_Red_List_assessments_with_IUCNN")
for a tutorial on how to run IUCNN.
if (FALSE) {
data("training_occ")
data("training_labels")
train_feat <- iucnn_prepare_features(training_occ, type = "geographic")
labels_train <- iucnn_prepare_labels(training_labels, train_feat,
level = 'detail')
train_output <- iucnn_train_model(x = train_feat,
lab = labels_train,
patience = 10)
imp_def <- iucnn_feature_importance(x = train_output)
imp_cust <- iucnn_feature_importance(x = train_output,
feature_blocks = list(block1 = c(1,2,3,4),
block2 = c(5,6,7,8)),
provide_indices = TRUE)
}