Evaluate relative importance of training features — iucnn_feature

Uses a model generated with iucnn_train_model to evaluate how much each feature or group of features contributes to the accuracy of the test set predictions. The function implements the concept of permutation feature importance, in which the values in a given feature column of the test set are shuffled randomly among all samples. Then the feature data manipulated in this manner are used to predict labels for the test set and the accuracy is compared to that of the original feature data. The difference (delta accuracy) can be interpreted as a measure of how important a given feature or group of features is for the trained NN to make accurate predictions.

iucnn_feature_importance(
  x,
  feature_blocks = list(),
  n_permutations = 100,
  provide_indices = FALSE,
  verbose = FALSE,
  unlink_features_within_block = TRUE
)

Arguments

x: iucnn_model object, as produced as output when running iucnn_train_model
feature_blocks: a list. Default behavior is to group the features into geographic, climatic, biome, and human footprint features. Provide custom list of feature names or indices to define other feature blocks. If feature indices are provided as in this example, turn provide_indices flag to TRUE.
n_permutations: an integer. Defines how many iterations of shuffling feature values and predicting the resulting accuracy are being executed. The mean and standard deviation of the delta accuracy are being summarized from these permutations.
provide_indices: logical. Set to TRUE if custom feature_blocks are provided as indices. Default is FALSE.
verbose: logical. Set to TRUE to print screen output while calculating feature importance. Default is FALSE.
unlink_features_within_block: logical. If TRUE, the features within each defined block are shuffled independently. If FALSE, each feature column within a block is resorted in the same manner. Default is TRUE.

Value

a data.frame with the relative importance of each feature block (see delta_acc_mean column).

Details

By default this function groups the features into geographic, climatic, biome, and human footprint features and determines the importance of each of these blocks of features. The feature blocks can be manually defined using the feature_blocks argument.

Note

See vignette("Approximate_IUCN_Red_List_assessments_with_IUCNN") for a tutorial on how to run IUCNN.

Examples

if (FALSE) {
data("training_occ")
data("training_labels")

train_feat <- iucnn_prepare_features(training_occ, type = "geographic")
labels_train <- iucnn_prepare_labels(training_labels, train_feat,
                                    level = 'detail')

train_output <- iucnn_train_model(x = train_feat,
                          lab = labels_train,
                          patience = 10)


imp_def <- iucnn_feature_importance(x = train_output)
imp_cust <- iucnn_feature_importance(x = train_output,
                              feature_blocks = list(block1 = c(1,2,3,4),
                                                    block2 = c(5,6,7,8)),
                              provide_indices = TRUE)
}