alibi_detect.datasets module¶

alibi_detect.datasets.corruption_types_cifar10c()[source]¶

Retrieve list with corruption types used in CIFAR-10-C.

Return type: List[str]
Returns: List with corruption types.

alibi_detect.datasets.fetch_attack(dataset, model, attack, return_X_y=False)[source]¶

Load adversarial instances for a given dataset, model and attack type.

Parameters

dataset (str) – Dataset under attack.
model (str) – Model under attack.
attack (str) – Attack name.
return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]]]

Returns

Bunch – Adversarial instances with original labels.
(train data, train target), (test data, test target) – Tuple of tuples if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_cifar10c(corruption, severity, return_X_y=False)[source]¶

Fetch CIFAR-10-C data. Originally obtained from https://zenodo.org/record/2535967#.XkKh2XX7Qts and introduced in “Hendrycks, D and Dietterich, T.G. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. In 7th International Conference on Learning Represenations, 2019.”.

Parameters

corruption (Union[str, List[str]]) – Corruption type. Options can be checked with get_corruption_cifar10c(). Alternatively, specify ‘all’ for all corruptions at a severity level.
severity (int) – Severity level of corruption (1-5).
return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[ndarray, ndarray]]

Returns

Bunch – Corrupted dataset with labels.
(corrupted data, target) – Tuple if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_ecg(return_X_y=False)[source]¶

Fetch ECG5000 data. The dataset contains 5000 ECG’s, originally obtained from Physionet (https://archive.physionet.org/cgi-bin/atm/ATM) under the name “BIDMC Congestive Heart Failure Database(chfdb)”, record “chf07”.

Parameters

return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]]]

Returns

Bunch – Train and test datasets with labels.
(train data, train target), (test data, test target) – Tuple of tuples if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_genome(return_X_y=False, return_labels=False)[source]¶

Load genome data including their labels and whether they are outliers or not. More details about the data can be found in the readme on https://console.cloud.google.com/storage/browser/seldon-datasets/genome/. The original data can be found here: https://drive.google.com/drive/folders/1Ht9xmzyYPbDouUTl_KQdLTJQYX2CuclR.

Parameters

return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.
return_labels (bool) – Whether to return the genome labels which are detailed in the label_json dict of the returned Bunch object.

Return type

Union[Bunch, tuple]

Returns

Bunch – Training, validation and test data, whether they are outliers and optionally including the genome labels which are specified in the label_json key as a dictionary.
(data, outlier) or (data, outlier, target) – Tuple for the train, validation and test set with either the data and whether they are outliers or the data, outlier flag and labels for the genomes if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_kdd(target=['dos', 'r2l', 'u2r', 'probe'], keep_cols=['srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate'], percent10=True, return_X_y=False)[source]¶

KDD Cup ‘99 dataset. Detect computer network intrusions.

Parameters

target (list) – List with attack types to detect.
keep_cols (list) – List with columns to keep. Defaults to continuous features.
percent10 (bool) – Bool, whether to only return 10% of the data.
return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[ndarray, ndarray]]

Returns

Bunch – Dataset and outlier labels (0 means ‘normal’ and 1 means ‘outlier’).
(data, target) – Tuple if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_nab(ts, return_X_y=False)[source]¶

Get time series in a DataFrame from the Numenta Anomaly Benchmark: https://github.com/numenta/NAB.

Parameters

ts (str) –
return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[DataFrame, DataFrame]]

Returns

Bunch – Dataset and outlier labels (0 means ‘normal’ and 1 means ‘outlier’) in DataFrames with timestamps.
(data, target) – Tuple if ‘return_X_y’ equals True.

alibi_detect.datasets.get_list_nab()[source]¶

Get list of possible time series to retrieve from the Numenta Anomaly Benchmark: https://github.com/numenta/NAB.

Return type: list
Returns: List with time series names.

alibi_detect.datasets.google_bucket_list(url, folder, filetype=None, full_path=False)[source]¶

Retrieve list with items in google bucket folder.

Parameters

url (str) – Bucket directory.
folder (str) – Folder to retrieve list of items from.
filetype (Optional[str]) – File extension, e.g. npy for saved numpy arrays.

Return type

List[str]

Returns

List with items in the folder of the google bucket.

alibi_detect.datasets.load_genome_npz(fold, return_labels=False)[source]¶

Return type: Union[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]]

alibi_detect.datasets.load_url_arff(url, dtype=numpy.float32)[source]¶

Load arff files from url.

Parameters: url (str) – Address of arff file.
Return type: ndarray
Returns: Arrays with data and labels.