sklift.datasets.fetch_x5

sklift.datasets.datasets.fetch_x5(data_home=None, dest_subdir=None, download_if_missing=True)[source]

Load and return the X5 RetailHero dataset (classification).

The dataset contains raw retail customer purchases, raw information about products and general info about customers.

Major columns:

  • treatment_flg (binary): treatment/control group flag

  • target (binary): target

  • customer_id (str): customer id - primary key for joining

Read more in the docs.

Parameters
  • data_home (str, unicode) – The path to the folder where datasets are stored.

  • dest_subdir (str, unicode) – The name of the folder in which the dataset is stored.

  • download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing

Returns

dataset.

Dictionary-like object, with the following attributes.

  • data (Bunch object): dictionary-like object without target and treatment:

    • clients (ndarray or DataFrame object): General info about clients.

    • train (ndarray or DataFrame object): A subset of clients for training.

    • purchases (ndarray or DataFrame object): clients’ purchase history prior to communication.

  • target (Series object): Column target by values.

  • treatment (Series object): Column treatment by values.

  • DESCR (str): Description of the X5 dataset.

  • feature_names (Bunch object): Names of the features.

  • target_name (str): Name of the target.

  • treatment_name (str): Name of the treatment.

Return type

Bunch

References

https://ods.ai/competitions/x5-retailhero-uplift-modeling/data

Example:

from sklift.datasets import fetch_x5


dataset = fetch_x5()
data, target, treatment = dataset.data, dataset.target, dataset.treatment

# data - dictionary-like object
# data contains general info about clients:
clients = data.clients

# data contains a subset of clients for training:
train = data.train

# data contains a clients’ purchase history prior to communication.
purchases = data.purchases

See also

fetch_lenta(): Load and return the Lenta dataset (classification).

fetch_criteo(): Load and return the Criteo Uplift Prediction Dataset (classification).

fetch_hillstrom(): Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).

fetch_megafon(): Load and return the MegaFon Uplift Competition dataset (classification).

X5 RetailHero Uplift Modeling Dataset

The dataset is provided by X5 Retail Group at the RetailHero hackaton hosted in winter 2019.

The dataset contains raw retail customer purchases, raw information about products and general info about customers.

Machine learning competition website.

Data description

Data contains several parts:

  • train.csv: a subset of clients for training. The column treatment_flg indicates if there was a communication. The column target shows if there was a purchase afterward;

  • clients.csv: general info about clients;

  • purchases.csv: clients’ purchase history prior to communication.

X5 table schema

Fields

  • treatment_flg (binary): information on performed communication

  • target (binary): customer purchasing

Key figures

  • Format: CSV

  • Size: 647M (compressed) 4.17GB (uncompressed)

  • Rows:

    • in ‘clients.csv’: 400,162

    • in ‘purchases.csv’: 45,786,568

    • in ‘uplift_train.csv’: 200,039

  • Response Ratio: .62

  • Treatment Ratio: .5

About X5

https://upload.wikimedia.org/wikipedia/en/8/83/X5_Retail_Group_logo_2015.png

X5 Group is a leading Russian food retailer. The Company operates several retail formats: proximity stores under the Pyaterochka brand, supermarkets under the Perekrestok brand and hypermarkets under the Karusel brand, as well as the Perekrestok.ru online market, the 5Post parcel and Dostavka.Pyaterochka and Perekrestok. Bystro food delivery services.

Link to the company’s website: https://www.x5.ru/