sklift.datasets.fetch_criteo¶
- sklift.datasets.datasets.fetch_criteo(target_col='visit', treatment_col='treatment', data_home=None, dest_subdir=None, download_if_missing=True, percent10=False, return_X_y_t=False)[source]¶
Load and return the Criteo Uplift Prediction Dataset (classification).
This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising.
Major columns:
treatment
(binary): treatmentexposure
(binary): treatmentvisit
(binary): targetconversion
(binary): targetf0, ... , f11
(float): feature values
Read more in the docs.
- Parameters
target_col (string, 'visit', 'conversion' or 'all', default='visit') – Selects which column from dataset will be target. If ‘all’, return a DataFrame with all targets cols.
treatment_col (string,'treatment', 'exposure' or 'all', default='treatment') – Selects which column from dataset will be treatment. If ‘all’, return a DataFrame with all treatment cols.
data_home (string) – Specify a download and cache folder for the datasets.
dest_subdir (string) – The name of the folder in which the dataset is stored.
download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site.
percent10 (bool, default=False) – Whether to load only 10 percent of the data.
return_X_y_t (bool, default=False) – If True, returns (data, target, treatment) instead of a Bunch object.
- Returns
dataset.
- Bunch:
By default dictionary-like object, with the following attributes:
data
(DataFrame object): Dataset without target and treatment.target
(Series or DataFrame object): Column target by values.treatment
(Series or DataFrame object): Column treatment by values.DESCR
(str): Description of the Criteo dataset.feature_names
(list): Names of the features.target_name
(str list): Name of the target.treatment_name
(str or list): Name of the treatment.
- Tuple:
tuple (data, target, treatment) if return_X_y is True
- Return type
Bunch or tuple
Example:
from sklift.datasets import fetch_criteo dataset = fetch_criteo(target_col='conversion', treatment_col='exposure') data, target, treatment = dataset.data, dataset.target, dataset.treatment # alternative option data, target, treatment = fetch_criteo(target_col='conversion', treatment_col='exposure', return_X_y_t=True)
References
Diemert Eustache, Betlei Artem et al. [2018]
- DiemertEustacheBArtemRMR18
Diemert Eustache, Betlei Artem, Christophe Renaudin, and Amini Massih-Reza. A large scale benchmark for uplift modeling. In Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018. ACM, 2018.
See also
fetch_lenta()
: Load and return the Lenta dataset (classification).fetch_x5()
: Load and return the X5 RetailHero dataset (classification).fetch_hillstrom()
: Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).fetch_megafon()
: Load and return the MegaFon Uplift Competition dataset (classification).
Criteo Uplift Modeling Dataset¶
This is a copy of Criteo AI Lab Uplift Prediction dataset.
Data description¶
This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising.
Fields¶
Here is a detailed description of the fields (they are comma-separated in the file):
f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
treatment: treatment group. Flag if a company participates in the RTB auction for a particular user (binary: 1 = treated, 0 = control)
exposure: treatment effect, whether the user has been effectively exposed. Flag if a company wins in the RTB auction for the user (binary)
conversion: whether a conversion occured for this user (binary, label)
visit: whether a visit occured for this user (binary, label)
Key figures¶
Format: CSV
Size: 297M (compressed) 3,2GB (uncompressed)
Rows: 13,979,592
Response Ratio:
Average Visit Rate: .046992
Average Conversion Rate: .00292
Treatment Ratio: .85
This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP) This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.
About Criteo¶
Criteo is an advertising company that provides online display advertisements. The company was founded and is headquartered in Paris, France. Criteo’s product is a form of display advertising, which displays interactive banner advertisements, generated based on the online browsing preferences and behaviour for each customer. The solution operates on a pay per click/cost per click (CPC) basis.
Link to the company’s website: https://www.criteo.com/