sklift.datasets.datasets.fetch_megafon(data_home=None, dest_subdir=None, download_if_missing=True, return_X_y_t=False)[source]

Load and return the MegaFon Uplift Competition dataset (classification).

An uplift modeling dataset containing synthetic data generated by telecom companies, trying to bring them closer to the real case that they encountered.

Major columns:

  • X_1...X_50 : anonymized feature set

  • conversion (binary): target

  • treatment_group (str): customer purchasing

Read more in the docs.

  • data_home (str) – The path to the folder where datasets are stored.

  • dest_subdir (str) – The name of the folder in which the dataset is stored.

  • download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing.

  • return_X_y_t (bool) – If True, returns (data, target, treatment) instead of a Bunch object.




By default dictionary-like object, with the following attributes:

  • data (DataFrame object): Dataset without target and treatment.

  • target (Series object): Column target by values.

  • treatment (Series object): Column treatment by values.

  • DESCR (str): Description of the Megafon dataset.

  • feature_names (list): Names of the features.

  • target_name (str): Name of the target.

  • treatment_name (str): Name of the treatment.


tuple (data, target, treatment) if return_X_y is True

Return type

Bunch or tuple


from sklift.datasets import fetch_megafon

dataset = fetch_megafon()
data, target, treatment =,, dataset.treatment

# alternative option
data, target, treatment = fetch_megafon(return_X_y_t=True)

See also

fetch_lenta(): Load and return the Lenta dataset (classification).

fetch_x5(): Load and return the X5 RetailHero dataset (classification).

fetch_criteo(): Load and return the Criteo Uplift Prediction Dataset (classification).

fetch_hillstrom(): Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).

MegaFon Uplift Competition Dataset

Machine learning competition website.

Data description

The dataset is provided by MegaFon at the MegaFon Uplift Competition hosted in may 2021.

The dataset contains generated synthetic data, trying to bring them closer to the real case that they encountered.


  • X_1…X_50: anonymized feature set

  • treatment_group (str): treatment/control group flag

  • conversion (binary): customer purchasing

Key figures

  • Format: CSV

  • Size: 554M

  • Rows: 600,000

  • Response Ratio: .2

  • Treatment Ratio: .5