sklift.datasets.fetch_criteo¶
-
sklift.datasets.datasets.
fetch_criteo
(target_col='visit', treatment_col='treatment', data_home=None, dest_subdir=None, download_if_missing=True, percent10=False, return_X_y_t=False)[source]¶ Load and return the Criteo Uplift Prediction Dataset (classification).
This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising.
Major columns:
treatment
(binary): treatmentexposure
(binary): treatmentvisit
(binary): targetconversion
(binary): targetf0, ... , f11
(float): feature values
Read more in the docs.
- Parameters
target_col (string, 'visit', 'conversion' or 'all', default='visit') – Selects which column from dataset will be target. If ‘all’, return a DataFrame with all targets cols.
treatment_col (string,'treatment', 'exposure' or 'all', default='treatment') – Selects which column from dataset will be treatment. If ‘all’, return a DataFrame with all treatment cols.
data_home (string) – Specify a download and cache folder for the datasets.
dest_subdir (string) – The name of the folder in which the dataset is stored.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site.
percent10 (bool, default=False) – Whether to load only 10 percent of the data.
return_X_y_t (bool, default=False) – If True, returns (data, target, treatment) instead of a Bunch object.
- Returns
dataset.
- Bunch:
By default dictionary-like object, with the following attributes:
data
(DataFrame object): Dataset without target and treatment.target
(Series or DataFrame object): Column target by values.treatment
(Series or DataFrame object): Column treatment by values.DESCR
(str): Description of the Lenta dataset.feature_names
(list): Names of the features.target_name
(str list): Name of the target.treatment_name
(str or list): Name of the treatment.
- Tuple:
tuple (data, target, treatment) if return_X_y is True
- Return type
Bunch or tuple
References
“A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)
Criteo Uplift Modeling Dataset¶
This is a copy of Criteo AI Lab Uplift Prediction dataset.
Data description¶
This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising.
Fields¶
Here is a detailed description of the fields (they are comma-separated in the file):
f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
treatment: treatment group. Flag if a company participates in the RTB auction for a particular user (binary: 1 = treated, 0 = control)
exposure: treatment effect, whether the user has been effectively exposed. Flag if a company wins in the RTB auction for the user (binary)
conversion: whether a conversion occured for this user (binary, label)
visit: whether a visit occured for this user (binary, label)
Key figures¶
Format: CSV
Size: 297M (compressed) 3,2GB (uncompressed)
Rows: 13,979,592
Average Visit Rate: .046992
Average Conversion Rate: .00292
Treatment Ratio: .85
This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP) This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.