sklift.datasets.fetch_criteo

sklift.datasets.datasets.fetch_criteo(target_col='visit', treatment_col='treatment', data_home=None, dest_subdir=None, download_if_missing=True, percent10=False, return_X_y_t=False)[source]

Load and return the Criteo Uplift Prediction Dataset (classification).

This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising.

Major columns:

  • treatment (binary): treatment

  • exposure (binary): treatment

  • visit (binary): target

  • conversion (binary): target

  • f0, ... , f11 (float): feature values

Read more in the docs.

Parameters
  • target_col (string, 'visit', 'conversion' or 'all', default='visit') – Selects which column from dataset will be target. If ‘all’, return a DataFrame with all targets cols.

  • treatment_col (string,'treatment', 'exposure' or 'all', default='treatment') – Selects which column from dataset will be treatment. If ‘all’, return a DataFrame with all treatment cols.

  • data_home (string) – Specify a download and cache folder for the datasets.

  • dest_subdir (string) – The name of the folder in which the dataset is stored.

  • download_if_missing (bool, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site.

  • percent10 (bool, default=False) – Whether to load only 10 percent of the data.

  • return_X_y_t (bool, default=False) – If True, returns (data, target, treatment) instead of a Bunch object.

Returns

dataset.

Bunch:

By default dictionary-like object, with the following attributes:

  • data (DataFrame object): Dataset without target and treatment.

  • target (Series or DataFrame object): Column target by values.

  • treatment (Series or DataFrame object): Column treatment by values.

  • DESCR (str): Description of the Criteo dataset.

  • feature_names (list): Names of the features.

  • target_name (str list): Name of the target.

  • treatment_name (str or list): Name of the treatment.

Tuple:

tuple (data, target, treatment) if return_X_y is True

Return type

Bunch or tuple

Example:

from sklift.datasets import fetch_criteo


dataset = fetch_criteo(target_col='conversion', treatment_col='exposure')
data, target, treatment = dataset.data, dataset.target, dataset.treatment

# alternative option
data, target, treatment = fetch_criteo(target_col='conversion', treatment_col='exposure', return_X_y_t=True)

References

Diemert Eustache, Betlei Artem et al. [2018]

DiemertEustacheBArtemRMR18

Diemert Eustache, Betlei Artem, Christophe Renaudin, and Amini Massih-Reza. A large scale benchmark for uplift modeling. In Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018. ACM, 2018.

See also

fetch_lenta(): Load and return the Lenta dataset (classification).

fetch_x5(): Load and return the X5 RetailHero dataset (classification).

fetch_hillstrom(): Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).

fetch_megafon(): Load and return the MegaFon Uplift Competition dataset (classification).

Criteo Uplift Modeling Dataset

This is a copy of Criteo AI Lab Uplift Prediction dataset.

Data description

This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising.

Fields

Here is a detailed description of the fields (they are comma-separated in the file):

  • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)

  • treatment: treatment group. Flag if a company participates in the RTB auction for a particular user (binary: 1 = treated, 0 = control)

  • exposure: treatment effect, whether the user has been effectively exposed. Flag if a company wins in the RTB auction for the user (binary)

  • conversion: whether a conversion occured for this user (binary, label)

  • visit: whether a visit occured for this user (binary, label)

Key figures

  • Format: CSV

  • Size: 297M (compressed) 3,2GB (uncompressed)

  • Rows: 13,979,592

  • Response Ratio:

    • Average Visit Rate: .046992

    • Average Conversion Rate: .00292

  • Treatment Ratio: .85

This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP) This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.

About Criteo

https://upload.wikimedia.org/wikipedia/commons/d/d2/Criteo_logo21.svg

Criteo is an advertising company that provides online display advertisements. The company was founded and is headquartered in Paris, France. Criteo’s product is a form of display advertising, which displays interactive banner advertisements, generated based on the online browsing preferences and behaviour for each customer. The solution operates on a pay per click/cost per click (CPC) basis.

Link to the company’s website: https://www.criteo.com/