sklift.datasets.fetch_x5

sklift.datasets.datasets.fetch_x5(data_home=None, dest_subdir=None, download_if_missing=True)[source]

Load and return the X5 RetailHero dataset (classification).

The dataset contains raw retail customer purchases, raw information about products and general info about customers.

Major columns:

  • treatment_flg (binary): treatment/control group flag

  • target (binary): target

  • customer_id (str): customer id - primary key for joining

Read more in the docs.

Parameters
  • data_home (str, unicode) – The path to the folder where datasets are stored.

  • dest_subdir (str, unicode) – The name of the folder in which the dataset is stored.

  • download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing

Returns

dataset.

Dictionary-like object, with the following attributes.

  • data (Bunch object): dictionary-like object without target and treatment:

    • clients (ndarray or DataFrame object): General info about clients.

    • train (ndarray or DataFrame object): A subset of clients for training.

    • purchases (ndarray or DataFrame object): clients’ purchase history prior to communication.

  • target (Series object): Column target by values.

  • treatment (Series object): Column treatment by values.

  • DESCR (str): Description of the Lenta dataset.

  • feature_names (Bunch object): Names of the features.

  • target_name (str): Name of the target.

  • treatment_name (str): Name of the treatment.

Return type

Bunch

References

https://ods.ai/competitions/x5-retailhero-uplift-modeling/data

X5 RetailHero Uplift Modeling Dataset

The dataset is provided by X5 Retail Group at the RetailHero hackaton hosted in winter 2019.

The dataset contains raw retail customer purchases, raw information about products and general info about customers.

Machine learning competition website.

Data description

Data contains several parts:

  • train.csv: a subset of clients for training. The column treatment_flg indicates if there was a communication. The column target shows if there was a purchase afterward;

  • clients.csv: general info about clients;

  • purchases.csv: clients’ purchase history prior to communication.

Fields

  • treatment_flg (binary): information on performed communication

  • target (binary): customer purchasing