sklift.datasets.fetch_x5
- sklift.datasets.datasets.fetch_x5(data_home=None, dest_subdir=None, download_if_missing=True)[source]
Load and return the X5 RetailHero dataset (classification).
The dataset contains raw retail customer purchases, raw information about products and general info about customers.
Major columns:
treatment_flg
(binary): treatment/control group flagtarget
(binary): targetcustomer_id
(str): customer id - primary key for joining
Read more in the docs.
- Parameters
data_home (str, unicode) – The path to the folder where datasets are stored.
dest_subdir (str, unicode) – The name of the folder in which the dataset is stored.
download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing
- Returns
dataset.
Dictionary-like object, with the following attributes.
data
(Bunch object): dictionary-like object without target and treatment:clients
(ndarray or DataFrame object): General info about clients.train
(ndarray or DataFrame object): A subset of clients for training.purchases
(ndarray or DataFrame object): clients’ purchase history prior to communication.
target
(Series object): Column target by values.treatment
(Series object): Column treatment by values.DESCR
(str): Description of the X5 dataset.feature_names
(Bunch object): Names of the features.target_name
(str): Name of the target.treatment_name
(str): Name of the treatment.
- Return type
Bunch
References
https://ods.ai/competitions/x5-retailhero-uplift-modeling/data
Example:
from sklift.datasets import fetch_x5 dataset = fetch_x5() data, target, treatment = dataset.data, dataset.target, dataset.treatment # data - dictionary-like object # data contains general info about clients: clients = data.clients # data contains a subset of clients for training: train = data.train # data contains a clients’ purchase history prior to communication. purchases = data.purchases
See also
fetch_lenta()
: Load and return the Lenta dataset (classification).fetch_criteo()
: Load and return the Criteo Uplift Prediction Dataset (classification).fetch_hillstrom()
: Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).fetch_megafon()
: Load and return the MegaFon Uplift Competition dataset (classification).
X5 RetailHero Uplift Modeling Dataset
The dataset is provided by X5 Retail Group at the RetailHero hackaton hosted in winter 2019.
The dataset contains raw retail customer purchases, raw information about products and general info about customers.
Machine learning competition website.
Data description
Data contains several parts:
train.csv: a subset of clients for training. The column treatment_flg indicates if there was a communication. The column target shows if there was a purchase afterward;
clients.csv: general info about clients;
purchases.csv: clients’ purchase history prior to communication.

Fields
treatment_flg (binary): information on performed communication
target (binary): customer purchasing
Key figures
Format: CSV
Size: 647M (compressed) 4.17GB (uncompressed)
Rows:
in ‘clients.csv’: 400,162
in ‘purchases.csv’: 45,786,568
in ‘uplift_train.csv’: 200,039
Response Ratio: .62
Treatment Ratio: .5
About X5

X5 Group is a leading Russian food retailer. The Company operates several retail formats: proximity stores under the Pyaterochka brand, supermarkets under the Perekrestok brand and hypermarkets under the Karusel brand, as well as the Perekrestok.ru online market, the 5Post parcel and Dostavka.Pyaterochka and Perekrestok. Bystro food delivery services.
Link to the company’s website: https://www.x5.ru/