sklift.datasets.fetch_megafon

sklift.datasets.datasets.fetch_megafon(data_home=None, dest_subdir=None, download_if_missing=True, return_X_y_t=False)[source]

Load and return the MegaFon Uplift Competition dataset (classification).

An uplift modeling dataset containing synthetic data generated by telecom companies, trying to bring them closer to the real case that they encountered.

Major columns:

  • X_1...X_50 : anonymized feature set

  • conversion (binary): target

  • treatment_group (str): customer purchasing

Read more in the docs.

Parameters
  • data_home (str) – The path to the folder where datasets are stored.

  • dest_subdir (str) – The name of the folder in which the dataset is stored.

  • download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing.

  • return_X_y_t (bool) – If True, returns (data, target, treatment) instead of a Bunch object.

Returns

dataset.

Bunch:

By default dictionary-like object, with the following attributes:

  • data (DataFrame object): Dataset without target and treatment.

  • target (Series object): Column target by values.

  • treatment (Series object): Column treatment by values.

  • DESCR (str): Description of the Megafon dataset.

  • feature_names (list): Names of the features.

  • target_name (str): Name of the target.

  • treatment_name (str): Name of the treatment.

Tuple:

tuple (data, target, treatment) if return_X_y is True

Return type

Bunch or tuple

Example:

from sklift.datasets import fetch_megafon


dataset = fetch_megafon()
data, target, treatment = dataset.data, dataset.target, dataset.treatment

# alternative option
data, target, treatment = fetch_megafon(return_X_y_t=True)

See also

fetch_lenta(): Load and return the Lenta dataset (classification).

fetch_x5(): Load and return the X5 RetailHero dataset (classification).

fetch_criteo(): Load and return the Criteo Uplift Prediction Dataset (classification).

fetch_hillstrom(): Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).

MegaFon Uplift Competition Dataset

Machine learning competition website.

Data description

The dataset is provided by MegaFon at the MegaFon Uplift Competition hosted in may 2021.

The dataset contains generated synthetic data, trying to bring them closer to the real case that they encountered.

Fields

  • X_1…X_50: anonymized feature set

  • treatment_group (str): treatment/control group flag

  • conversion (binary): customer purchasing

Key figures

  • Format: CSV

  • Size: 554M

  • Rows: 600,000

  • Response Ratio: .2

  • Treatment Ratio: .5

About MegaFon

https://upload.wikimedia.org/wikipedia/commons/9/9e/MegaFon_logo.svg

MegaFon (Russian: МегаФон) , previously known as North-West GSM, is the second largest mobile phone operator and the third largest telecom operator in Russia. It works in the GSM, UMTS and LTE standard. As of June 2012, the company serves 62.1 million subscribers in Russia and 1.6 million in Tajikistan. It is headquartered in Moscow.

Link to the company’s website: https://megafon.ru/