alibi_detect.ad.adversarialae module¶

class alibi_detect.ad.adversarialae.AdversarialAE(threshold=None, ae=None, model=None, encoder_net=None, decoder_net=None, model_hl=None, hidden_layer_kld=None, w_model_hl=None, temperature=1.0, data_type=None)[source]¶

Bases: alibi_detect.base.BaseDetector, alibi_detect.base.FitMixin, alibi_detect.base.ThresholdMixin

__init__(threshold=None, ae=None, model=None, encoder_net=None, decoder_net=None, model_hl=None, hidden_layer_kld=None, w_model_hl=None, temperature=1.0, data_type=None)[source]¶

Autoencoder (AE) based adversarial detector.

Parameters

threshold (Optional[float]) – Threshold used for adversarial score to determine adversarial instances.
ae (Optional[tensorflow.keras.Model]) – A trained tf.keras autoencoder model if available.
model (Optional[tensorflow.keras.Model]) – A trained tf.keras classification model.
encoder_net (Optional[tensorflow.keras.Sequential]) – Layers for the encoder wrapped in a tf.keras.Sequential class if no ‘ae’ is specified.
decoder_net (Optional[tensorflow.keras.Sequential]) – Layers for the decoder wrapped in a tf.keras.Sequential class if no ‘ae’ is specified.
model_hl (Optional[List[tensorflow.keras.Model]]) – List with tf.keras models for the hidden layer K-L divergence computation.
hidden_layer_kld (Optional[dict]) – Dictionary with as keys the hidden layer(s) of the model which are extracted and used during training of the AE, and as values the output dimension for the hidden layer.
w_model_hl (Optional[list]) – Weights assigned to the loss of each model in model_hl.
temperature (float) – Temperature used for model prediction scaling. Temperature <1 sharpens the prediction probability distribution.
data_type (Optional[str]) – Optionally specifiy the data type (tabular, image or time-series). Added to metadata.

Return type

None

correct(X, batch_size=10000000000, return_instance_score=True, return_all_predictions=True)[source]¶

Correct adversarial instances if the adversarial score is above the threshold.

Parameters

X (numpy.ndarray) – Batch of instances.
batch_size (int) – Batch size used when computing scores.
return_instance_score (bool) – Whether to return instance level adversarial scores.
return_all_predictions (bool) – Whether to return the predictions on the original and the reconstructed data.

Return type

Dict[Dict[str, str], Dict[str, numpy.ndarray]]

Returns

Dict with corrected predictions and information whether an instance is adversarial or not.

fit(X, loss_fn=<function loss_adv_ae>, w_model=1.0, w_recon=0.0, optimizer=tensorflow.keras.optimizers.Adam, epochs=20, batch_size=128, verbose=True, log_metric=None, callbacks=None, preprocess_fn=None)[source]¶

Train Adversarial AE model.

Parameters

X (numpy.ndarray) – Training batch.
loss_fn (tensorflow.keras.losses) – Loss function used for training.
w_model (float) – Weight on model prediction loss term.
w_recon (float) – Weight on MSE reconstruction error loss term.
optimizer (tensorflow.keras.optimizers) – Optimizer used for training.
epochs (int) – Number of training epochs.
batch_size (int) – Batch size used for training.
verbose (bool) – Whether to print training progress.
log_metric (Optional[Tuple[str, tensorflow.keras.metrics]]) – Additional metrics whose progress will be displayed if verbose equals True.
callbacks (Optional[tensorflow.keras.callbacks]) – Callbacks used during training.
preprocess_fn (Optional[Callable]) – Preprocessing function applied to each training batch.

Return type

None

infer_threshold(X, threshold_perc=99.0, margin=0.0, batch_size=10000000000)[source]¶

Update threshold by a value inferred from the percentage of instances considered to be adversarial in a sample of the dataset.

Parameters

X (numpy.ndarray) – Batch of instances.
threshold_perc (float) – Percentage of X considered to be normal based on the adversarial score.
margin (float) – Add margin to threshold. Useful if adversarial instances have significantly higher scores and there is no adversarial instance in X.
batch_size (int) – Batch size used when computing scores.

Return type

None

predict(X, batch_size=10000000000, return_instance_score=True)[source]¶

Predict whether instances are adversarial instances or not.

Parameters

X (numpy.ndarray) – Batch of instances.
batch_size (int) – Batch size used when computing scores.
return_instance_score (bool) – Whether to return instance level adversarial scores.

Return type

Dict[Dict[str, str], Dict[str, numpy.ndarray]]

Returns

Dictionary containing ‘meta’ and ‘data’ dictionaries.
’meta’ has the model’s metadata.
’data’ contains the adversarial predictions and instance level adversarial scores.

score(X, batch_size=10000000000, return_predictions=False)[source]¶

Compute adversarial scores.

Parameters

X (numpy.ndarray) – Batch of instances to analyze.
batch_size (int) – Batch size used when computing scores.
return_predictions (bool) – Whether to return the predictions of the classifier on the original and reconstructed instances.

Return type

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]

Returns

Array with adversarial scores for each instance in the batch.

class alibi_detect.ad.adversarialae.DenseHidden(model, hidden_layer, output_dim, hidden_dim=None)[source]¶

Bases: tensorflow.keras.Model

__init__(model, hidden_layer, output_dim, hidden_dim=None)[source]¶

Dense layer that extracts the feature map of a hidden layer in a model and computes output probabilities over that layer.

Parameters

model (tensorflow.keras.Model) – tf.keras classification model.
hidden_layer (int) – Hidden layer from model where feature map is extracted from.
output_dim (int) – Output dimension for softmax layer.
hidden_dim (Optional[int]) – Dimension of optional additional dense layer.

Return type

None

call(x)[source]¶

Return type: tensorflow.Tensor