alibi_detect.datasets module

alibi_detect.datasets.corruption_types_cifar10c()[source]

Retrieve list with corruption types used in CIFAR-10-C.

Return type

List[str]

Returns

List with corruption types.

alibi_detect.datasets.fetch_attack(dataset, model, attack, return_X_y=False)[source]

Load adversarial instances for a given dataset, model and attack type.

Parameters
  • dataset (str) – Dataset under attack.

  • model (str) – Model under attack.

  • attack (str) – Attack name.

  • return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]]]

Returns

  • Bunch – Adversarial instances with original labels.

  • (train data, train target), (test data, test target) – Tuple of tuples if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_cifar10c(corruption, severity, return_X_y=False)[source]

Fetch CIFAR-10-C data. Originally obtained from https://zenodo.org/record/2535967#.XkKh2XX7Qts and introduced in “Hendrycks, D and Dietterich, T.G. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. In 7th International Conference on Learning Represenations, 2019.”.

Parameters
  • corruption (Union[str, List[str]]) – Corruption type. Options can be checked with get_corruption_cifar10c(). Alternatively, specify ‘all’ for all corruptions at a severity level.

  • severity (int) – Severity level of corruption (1-5).

  • return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[ndarray, ndarray]]

Returns

  • Bunch – Corrupted dataset with labels.

  • (corrupted data, target) – Tuple if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_ecg(return_X_y=False)[source]

Fetch ECG5000 data. The dataset contains 5000 ECG’s, originally obtained from Physionet (https://archive.physionet.org/cgi-bin/atm/ATM) under the name “BIDMC Congestive Heart Failure Database(chfdb)”, record “chf07”.

Parameters

return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray]]]

Returns

  • Bunch – Train and test datasets with labels.

  • (train data, train target), (test data, test target) – Tuple of tuples if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_genome(return_X_y=False, return_labels=False)[source]

Load genome data including their labels and whether they are outliers or not. More details about the data can be found in the readme on https://console.cloud.google.com/storage/browser/seldon-datasets/genome/. The original data can be found here: https://drive.google.com/drive/folders/1Ht9xmzyYPbDouUTl_KQdLTJQYX2CuclR.

Parameters
  • return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

  • return_labels (bool) – Whether to return the genome labels which are detailed in the label_json dict of the returned Bunch object.

Return type

Union[Bunch, tuple]

Returns

  • Bunch – Training, validation and test data, whether they are outliers and optionally including the genome labels which are specified in the label_json key as a dictionary.

  • (data, outlier) or (data, outlier, target) – Tuple for the train, validation and test set with either the data and whether they are outliers or the data, outlier flag and labels for the genomes if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_kdd(target=['dos', 'r2l', 'u2r', 'probe'], keep_cols=['srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate'], percent10=True, return_X_y=False)[source]

KDD Cup ‘99 dataset. Detect computer network intrusions.

Parameters
  • target (list) – List with attack types to detect.

  • keep_cols (list) – List with columns to keep. Defaults to continuous features.

  • percent10 (bool) – Bool, whether to only return 10% of the data.

  • return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[ndarray, ndarray]]

Returns

  • Bunch – Dataset and outlier labels (0 means ‘normal’ and 1 means ‘outlier’).

  • (data, target) – Tuple if ‘return_X_y’ equals True.

alibi_detect.datasets.fetch_nab(ts, return_X_y=False)[source]

Get time series in a DataFrame from the Numenta Anomaly Benchmark: https://github.com/numenta/NAB.

Parameters
  • ts (str) –

  • return_X_y (bool) – Bool, whether to only return the data and target values or a Bunch object.

Return type

Union[Bunch, Tuple[DataFrame, DataFrame]]

Returns

  • Bunch – Dataset and outlier labels (0 means ‘normal’ and 1 means ‘outlier’) in DataFrames with timestamps.

  • (data, target) – Tuple if ‘return_X_y’ equals True.

alibi_detect.datasets.get_list_nab()[source]

Get list of possible time series to retrieve from the Numenta Anomaly Benchmark: https://github.com/numenta/NAB.

Return type

list

Returns

List with time series names.

alibi_detect.datasets.google_bucket_list(url, folder, filetype=None, full_path=False)[source]

Retrieve list with items in google bucket folder.

Parameters
  • url (str) – Bucket directory.

  • folder (str) – Folder to retrieve list of items from.

  • filetype (Optional[str]) – File extension, e.g. npy for saved numpy arrays.

Return type

List[str]

Returns

List with items in the folder of the google bucket.

alibi_detect.datasets.load_genome_npz(fold, return_labels=False)[source]
Return type

Union[Tuple[ndarray, ndarray], Tuple[ndarray, ndarray, ndarray]]

alibi_detect.datasets.load_url_arff(url, dtype=numpy.float32)[source]

Load arff files from url.

Parameters

url (str) – Address of arff file.

Return type

ndarray

Returns

Arrays with data and labels.