backwardcompatibilityml.helpers package

Submodules

backwardcompatibilityml.helpers.comparison module

backwardcompatibilityml.helpers.comparison.compare_models(h1, h2, dataset, performance_metric, get_instance_metadata=None, device='cpu')

backwardcompatibilityml.helpers.http module

backwardcompatibilityml.helpers.http.no_cache(f)

backwardcompatibilityml.helpers.models module

class backwardcompatibilityml.helpers.models.LogisticRegression(input_dim, output_dim)

Bases: torch.nn.modules.module.Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class backwardcompatibilityml.helpers.models.MLPClassifier(input_size, num_classes, hidden_sizes=[50, 10])

Bases: torch.nn.modules.module.Module

forward(data, sample_weight=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

backwardcompatibilityml.helpers.training module

backwardcompatibilityml.helpers.training.compatibility_scores(h1, h2, dataset, device='cpu')
Parameters:
  • h1 – Reference Pytorch model.
  • h2 – The model being compared to h1.
  • dataset – Data in the form of a list of batches of input/target pairs.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

A pair consisting of btc_dataset - the average trust compatibility score over all batches, and bec_dataset - the average error compatibility score over all batches.

backwardcompatibilityml.helpers.training.compatibility_sweep(sweeps_folder_path, number_of_epochs, h1, h2, training_set, test_set, batch_size_train, batch_size_test, OptimizerClass, optimizer_kwargs, NewErrorLossClass, StrictImitationLossClass, performance_metric=<function model_accuracy>, lambda_c_stepsize=0.25, percent_complete_queue=None, new_error_loss_kwargs=None, strict_imitation_loss_kwargs=None, get_instance_metadata=None, device='cpu', use_ml_flow=False, ml_flow_run_name='compatibility_sweep')

This function trains a new model using the backward compatibility loss function BCNLLLoss with respect to an existing model. It does this for each value of lambda_c betweek 0 and 1 at the specified step sizes. It saves the newly trained models in the specified folder.

Parameters:
  • sweeps_folder_path – A string value representing the full path of the folder wehre the result of the compatibility sweep is to be stored.
  • number_of_epochs – The number of training epochs to use on each sweep.
  • h1 – The reference model being used.
  • h2 – The new model being traind / updated.
  • training_set – The list of training samples as (batch_ids, input, target).
  • test_set – The list of testing samples as (batch_ids, input, target).
  • batch_size_train – An integer representing batch size of the training set.
  • batch_size_test – An integer representing the batch size of the test set.
  • OptimizerClass – The class to instantiate an optimizer from for training.
  • optimizer_kwargs – A dictionary of the keyword arguments to be used to instantiate the optimizer.
  • NewErrorLossClass – The class of the New Error style loss function to be instantiated and used to perform compatibility constrained training of our model h2.
  • StrictImitationLossClass – The class of the Strict Imitation style loss function to be instantiated and used to perform compatibility constrained training of our model h2.
  • performance_metric
    A function to evaluate model performance. The function is expected to have the following signature:
    metric(model, dataset, device)
    model: The model being evaluated dataset: The dataset as a list of (batch_ids, input, target) device: The device Pytorch is using for training - “cpu” or “cuda”

    If unspecified, then accuracy is used.

  • lambda_c_stepsize – The increments of lambda_c to use as we sweep the parameter space between 0.0 and 1.0.
  • percent_complete_queue – Optional thread safe queue to use for logging the status of the sweep in terms of the percentage complete.
  • get_instance_metadata
    A function that returns a text string representation of some metadata corresponding to the instance id. It should be a function of the form:
    get_instance_metadata(instance_id)
    instance_id: An integer instance id

    And should return a string.

  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
  • use_ml_flow – A boolean flag controlling whether or not to log the sweep with MLFlow. If true, an MLFlow run will be created with the name specified by ml_flow_run_name.
  • ml_flow_run_name – A string that configures the name of the MLFlow run.
backwardcompatibilityml.helpers.training.evaluate_model_performance_and_compatibility(h1, h2, training_set, test_set, performance_metric, device='cpu')

Calculate the error overlap of h1 and h2 on a batched dataset. Calculate the h2 model error fraction by class on a batched dataset.

Parameters:
  • h1 – The reference model being used.
  • h2 – The model being traind / updated.
  • performance_metric – Performance metric to be used when evaluating the model.
  • training_set – The list of batched training samples as (batch_ids, input, target).
  • test_set – The list of batched testing samples as (batch_ids, input, target).
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

A dictionary containing the results of the model performance and evaluation performed on the training and the testing sets separately.

backwardcompatibilityml.helpers.training.evaluate_model_performance_and_compatibility_on_dataset(h1, h2, dataset, performance_metric, get_instance_metadata=None, device='cpu')
Parameters:
  • h1 – The reference model being used.
  • h2 – The model being traind / updated.
  • performance_metric – Performance metric to be used when evaluating the model.
  • get_instance_metadata
    A function that returns a text string representation of some metadata corresponding to the instance id. It should be a function of the form:
    get_instance_metadata(instance_id)
    instance_id: An integer instance id

    And should return a string.

  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

A dictionary containing the models error overlap between h1 and h2, the error fraction by class of the model h2, the trust compatibility score of h2 with respect to h1, and the error compatibility score of h2 with respect to h1.

backwardcompatibilityml.helpers.training.get_all_error_instance_indices(h1, h2, batch_ids, batched_evaluation_data, batched_evaluation_target, get_instance_metadata=None, device='cpu')

Return the list of indices of instances from batched_evaluation_data on which the model prediction differs from the ground truth in batched_evaluation_target.

Parameters:
  • h1 – The baseline model.
  • h2 – The new updated model.
  • batch_ids – A list of the instance ids in the batch.
  • batched_evaluation_data – A single batch of input data to be passed to our model.
  • batched_evaluation_target – A single batch of the corresponding output targets.
  • get_instance_metadata
    A function that returns a text string representation of some metadata corresponding to the instance id. It should be a function of the form:
    get_instance_metadata(instance_id)
    instance_id: An integer instance id

    And should return a string.

  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

A list of indices of the instances within the batched data, for which the model did not match the expected target.

backwardcompatibilityml.helpers.training.get_error_instance_ids_by_class(model, batch_ids, batched_evaluation_data, batched_evaluation_target, device='cpu')

Return the instance ids corresponding to errors of the model by class.

Parameters:
  • model – The model being evaluated.
  • batched_evaluation_data – A single batch of input data to be passed to our model.
  • batched_evaluation_target – A single batch of the corresponding output targets.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

A dictionary of key / value pairs, where the key is the output class and the value is the list of instance ids corresponding to misclassification errors of the model within that class.

backwardcompatibilityml.helpers.training.get_error_instance_indices(model, batched_evaluation_data, batched_evaluation_target, device='cpu')

Return the list of indices of instances from batched_evaluation_data on which the model prediction differs from the ground truth in batched_evaluation_target.

Parameters:
  • model – The model being evaluated.
  • batched_evaluation_data – A single batch of input data to be passed to our model.
  • batched_evaluation_target – A single batch of the corresponding output targets.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

A list of indices of the instances within the batched data, for which the model did not match the expected target.

backwardcompatibilityml.helpers.training.get_incompatible_instances_by_class(all_errors, batch_ids, batched_evaluation_target, class_incompatible_instance_ids)

Finds instances where h2 is incompatible with h1 and inserts {class : incompatible_data_id} mappings into the class_incompatible_instance_ids dictionary.

Parameters:
  • all_errors – A list of tuples of error indices, h1 and h2 predictions, and ground truth for each instance
  • batch_ids – The instance ids of the data rows in the batched data.
  • batched_evaluation_target – A single batch of the corresponding output targets.
  • class_incompatible_instance_ids – The dictionary to fill with incompatible instances and their ids
backwardcompatibilityml.helpers.training.get_model_error_overlap(h1, h2, batch_ids, batched_evaluation_data, batched_evaluation_target, device='cpu')

Return the instance ids corresponding to errors of each model as well as the instance ids corresponding to errors common to both models.

Parameters:
  • h1 – Reference Pytorch model.
  • h2 – The model being compared to h1.
  • batch_ids – The instance ids of the data rows in the batched data.
  • batched_evaluation_data – A single batch of input data to be passed to our model.
  • batched_evaluation_target – A single batch of the corresponding output targets.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

instance_ids_of_errors_due_to_h1, instance_ids_of_errors_due_to_h2, instance_ids_of_errors_due_to_h1_and_h2

Return type:

A triple of the form

backwardcompatibilityml.helpers.training.test(network, loss_function, test_set, batch_size_test, device='cpu')

Tests a model in a test set using the loss function provided.

(Please note that this is not to be used for testing with a compatibility loss function.)

Parameters:
  • network – The model which is undergoing testing.
  • loss_function – An instance of the loss function to use for training.
  • test_set – The list of testing samples as (batch_ids, input, target).
  • batch_size_test – An integer representing the batch size of the test set.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

Returns a list of test loses.

backwardcompatibilityml.helpers.training.test_compatibility(h2, loss_function, test_set, batch_size_test, device='cpu')

Tests a model in a test set using the backward compatibility loss function provided.

Parameters:
  • h2 – The model which is undergoing training / updating.
  • loss_function – An instance of a compatibility loss function.
  • test_set – The list of testing samples as (batch_ids, input, target).
  • batch_size_test – An integer representing the batch size of the test set.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

Returns a list of test loses.

backwardcompatibilityml.helpers.training.train(number_of_epochs, network, optimizer, loss_function, training_set, test_set, batch_size_train, batch_size_test, device='cpu')

Trains a model with respect to a loss function, using an instance of an optimizer.

(Please note that this is not to be used for training with a compatibility loss function.)

Parameters:
  • network – The model which is undergoing training.
  • number_of_epochs – Number of epochs of training.
  • optimizer – The optimizer instance to use for training.
  • loss_function – An instance of the loss function to use for training.
  • training_set – The list of training samples as (batch_ids, input, target).
  • test_set – The list of testing samples as (batch_ids, input, target).
  • batch_size_train – An integer representing batch size of the training set.
  • batch_size_test – An integer representing the batch size of the test set.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

Returns four lists
train_counter - The index of a training samples at which training losses were logged.
test_counter - The index of testing samples at which testing losses were logged.
train_losses - The list of logged training losses.
test_losses - The list of logged testing losses.

backwardcompatibilityml.helpers.training.train_compatibility(number_of_epochs, h2, optimizer, loss_function, training_set, test_set, batch_size_train, batch_size_test, device='cpu')

Trains a new model with respect to an existing model using the compatibility loss function provided. The compatibility loss function may be either a New Error or Strict Imitation type loss function.

Parameters:
  • h2 – The model which is undergoing training / updating.
  • number_of_epochs – Number of epochs of training.
  • loss_function – An instance of a compatibility loss function.
  • optimizer – The optimizer instance to use for training.
  • training_set – The list of training samples as (batch_ids, input, target).
  • test_set – The list of testing samples as (batch_ids, input, target).
  • batch_size_train – An integer representing batch size of the training set.
  • batch_size_test – An integer representing the batch size of the test set.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

Returns four lists
train_counter - The index of a training samples at which training losses were logged.
test_counter - The index of testing samples at which testing losses were logged.
train_losses - The list of logged training losses.
test_losses - The list of logged testing losses.

backwardcompatibilityml.helpers.training.train_compatibility_epoch(epoch, h2, optimizer, loss_function, training_set, batch_size_train, device='cpu')

Trains a new model using the instance compatibility loss function provided, over a single epoch. The compatibility loss function instnace may be either a New Error or Strict Imitation type loss function.

Parameters:
  • epoch – The integer index of the training epoch being run.
  • h2 – The model which is undergoing training / updating.
  • optimizer – The optimizer instance to use for training.
  • loss_function – An instance of a compatibility loss function.
  • training_set – The list of training samples as (batch_ids, input, target).
  • batch_size_train – An integer representing batch size of the training set.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

A list of pairs of the form (training_instance_index, training_loss) at regular intervals of 10 training samples.

backwardcompatibilityml.helpers.training.train_epoch(epoch, network, optimizer, loss_function, training_set, batch_size_train, device='cpu')

Trains a model over a single training epoch, with respect to a loss function, using an instance of an optimizer.

(Please note that this is not to be used for training with a compatibility loss function.)

Parameters:
  • network – The model which is undergoing training.
  • optimizer – The optimizer instance to use for training.
  • loss_function – An instance of the loss function to use for training.
  • training_set – The list of training samples as (batch_ids, input, target).
  • batch_size_train – An integer representing batch size of the training set.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
Returns:

A list of pairs of the form (training_instance_index, training_loss) at regular intervals of 10 training samples.

backwardcompatibilityml.helpers.training.train_new_error(h1, h2, number_of_epochs, training_set, test_set, batch_size_train, batch_size_test, OptimizerClass, optimizer_kwargs, NewErrorLossClass, lambda_c, new_error_loss_kwargs=None, device='cpu')
Parameters:
  • h1 – Reference Pytorch model.
  • h2 – The model which is undergoing training / updating.
  • number_of_epochs – Number of epochs of training.
  • training_set – The list of training samples as (batch_ids, input, target).
  • test_set – The list of testing samples as (batch_ids, input, target).
  • batch_size_train – An integer representing batch size of the training set.
  • batch_size_test – An integer representing the batch size of the test set.
  • OptimizerClass – The class to instantiate an optimizer from for training.
  • optimizer_kwargs – A dictionary of the keyword arguments to be used to instantiate the optimizer.
  • NewErrorLossClass – The class of the New Error style loss function to be instantiated and used to perform compatibility constrained training of our model h2.
  • lambda_c – The regularization parameter to be used when calibrating the degree of compatibility to enforce while training.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.
backwardcompatibilityml.helpers.training.train_strict_imitation(h1, h2, number_of_epochs, training_set, test_set, batch_size_train, batch_size_test, OptimizerClass, optimizer_kwargs, StrictImitationLossClass, lambda_c, strict_imitation_loss_kwargs=None, device='cpu')
Parameters:
  • h1 – Reference Pytorch model.
  • h2 – The model which is undergoing training / updating.
  • number_of_epochs – Number of epochs of training.
  • training_set – The list of training samples as (batch_ids, input, target).
  • test_set – The list of testing samples as (batch_ids, input, target).
  • batch_size_train – An integer representing batch size of the training set.
  • batch_size_test – An integer representing the batch size of the test set.
  • OptimizerClass – The class to instantiate an optimizer from for training.
  • optimizer_kwargs – A dictionary of the keyword arguments to be used to instantiate the optimizer.
  • StrictImitationLossClass – The class of the Strict Imitation style loss function to be instantiated and used to perform compatibility constrained training of our model h2.
  • lambda_c – The regularization parameter to be used when calibrating the degree of compatibility to enforce while training.
  • device – A string with values either “cpu” or “cuda” to indicate the device that Pytorch is performing training on. By default this value is “cpu”. But in case your models reside on the GPU, make sure to set this to “cuda”. This makes sure that the input and target tensors are transferred to the GPU during training.

backwardcompatibilityml.helpers.utils module

backwardcompatibilityml.helpers.utils.add_memory_hooks(idx, mod, mem_log, exp, hr)
backwardcompatibilityml.helpers.utils.clean_from_gpu(tensors)

Utility function to clean tensors from the GPU. This is only intended to be used when investigating why memory usage is high. An in production solution should instead rely on correctly structuring your code so that Python garbage collection automatically removes the unreferenced tensors as they move out of function scope. :param tensors: A list of tensor objects to clean from the GPU.

Returns:None
backwardcompatibilityml.helpers.utils.generate_mem_hook(handle_ref, mem, idx, hook_type, exp)
backwardcompatibilityml.helpers.utils.get_class_probabilities(batch_label_tensor)
backwardcompatibilityml.helpers.utils.get_gpu_mem()
backwardcompatibilityml.helpers.utils.labels_to_probabilities(batch_class_labels, num_classes=None, batch_size=None)
backwardcompatibilityml.helpers.utils.log_mem(model, mem_log=None, exp=None)

Utility funtion for adding memory usage logging to a Pytorch model.

Example usage:
model = MyModel()
hook_handles, mem_log = log_mem(model, exp=”memory-profiling-experiment”)
… then do a training run …
mem_log should now contain the results of the memory profiling experiment.
Parameters:
  • model – A pytorch model
  • mem_log – Optional list object, which may contain data from previous profiling experiments.
  • exp – String identifier for the profiling experiment name.
Returns:

A pair consisting of mem_log - either the same mem_log list object that was passed in, or a newly constructed one, that will contain the results of the logging, and hook_handles - a list of handles for our logging hooks that will need to be cleared when we are done logging.

backwardcompatibilityml.helpers.utils.remove_memory_hooks(hook_handles)

Clear the memory profiling hooks put in place by log_mem :param hook_handles: A list of hook hndles to clear

Returns:None
backwardcompatibilityml.helpers.utils.show_allocated_tensors()

Attempts to print out the tensors in memory. :param None:

Returns:None
backwardcompatibilityml.helpers.utils.sigmoid_to_labels(batch_sigmoids, discriminant_pivot=0.5)

Module contents