Trains an CNN model based on a list of matrices with occurrence counts for a set of species, generated by iucnn_cnn_features, and the corresponding IUCN classes formatted as a iucnn_labels object with iucnn_prepare_labels. Note that taxa for which information is only present in one of the two input objects will be removed from further processing.

iucnn_cnn_train(
  x,
  lab,
  path_to_output = "iuc_nn_model",
  production_model = NULL,
  cv_fold = 1,
  test_fraction = 0.2,
  seed = 1234,
  max_epochs = 100,
  patience = 20,
  randomize_instances = TRUE,
  balance_classes = TRUE,
  dropout_rate = 0,
  mc_dropout_reps = 100,
  optimize_for = "loss",
  pooling_strategy = "average",
  save_model = TRUE,
  overwrite = FALSE,
  verbose = 0
)

Arguments

x

a list of matrices containing the occurrence counts across a spatial grid for a set of species.

lab

an object of the class iucnn_labels, as generated by iucnn_prepare_labels containing the labels for all species.

path_to_output

character string. The path to the location where the IUCNN model shall be saved

production_model

an object of type iucnn_model (default=NULL). If an iucnn_model is provided, iucnn_cnn_train will read the settings of this model and reproduce it, but use all available data for training, by automatically setting the validation set to 0 and cv_fold to 1. This is recommended before using the model for predicting the IUCN status of not evaluated species, as it generally improves the prediction accuracy of the model. Choosing this option will ignore all other provided settings below.

cv_fold

integer (default=1). When setting cv_fold > 1, iucnn_cnn_train will perform k-fold cross-validation. In this case, the provided setting for test_fraction will be ignored, as the test size of each CV-fold is determined by the specified number provided here.

test_fraction

numeric. The fraction of the input data used as test set.

seed

integer. Set a starting seed for reproducibility.

max_epochs

integer. The maximum number of epochs.

patience

integer. Number of epochs with no improvement after which training will be stopped.

randomize_instances

logical (default=TRUE). When set to TRUE (default) the instances will be shuffled before training (recommended).

balance_classes

logical (default=FALSE). If set to TRUE, iucnn_cnn_train will perform supersampling of the training instances to account for uneven class distribution in the training data.

dropout_rate

numeric. This will randomly turn off the specified fraction of nodes of the neural network during each epoch of training making the NN more stable and less reliant on individual nodes/weights, which can prevent over-fitting (only available for modes nn-class and nn-reg). See mc_dropout setting explained below if dropout shall also be applied to the predictions. For models trained with a dropout fraction > 0, the predictions (including the validation accuracy) will reflect the stochasticity introduced by the dropout method (MC dropout predictions). This is e.g. required when wanting to predict with a specified accuracy threshold (see target_acc option in iucnn_predict_status).

mc_dropout_reps

integer. The number of MC iterations to run when predicting validation accuracy and calculating the accuracy-threshold table required for making predictions with an accuracy threshold. The default of 100 is usually sufficient, larger values will lead to longer computation times, particularly during model testing with cross-validation.

optimize_for

string. Default is "loss", which will train the model until optimal validation set loss is reached. Set to "accuracy" if you want to optimize for maximum validation accuracy instead.

pooling_strategy

string. Pooling strategy after first convolutional layer. Choose between "average" (default) and "max".

save_model

logical. If TRUE the model is saved to disk.

overwrite

logical. If TRUE existing models are overwritten. Default is set to FALSE.

verbose

Default 0, set to 1 for iucnn_cnn_train to print additional info to the screen while training.

Value

outputs an iucnn_model object which can be used in iucnn_predict_status for predicting the conservation status of not evaluated species.

Note

See vignette("Approximate_IUCN_Red_List_assessments_with_IUCNN") for a tutorial on how to run IUCNN.

Examples

if (FALSE) {
data("training_occ") #geographic occurrences of species with IUCN assessment
data("training_labels")# the corresponding IUCN assessments

cnn_training_features <- iucnn_cnn_features(training_occ)
cnn_labels <- iucnn_prepare_labels(x = training_labels,
                     y = cnn_training_features)

trained_model <- iucnn_cnn_train(cnn_training_features,
                                cnn_labels,
                                overwrite = TRUE,
                                dropout = 0.1)
summary(trained_model)
}