sklift.datasets.fetch_lenta

sklift.datasets.datasets.fetch_lenta(data_home=None, dest_subdir=None, download_if_missing=True, return_X_y_t=False)[source]

Load and return the Lenta dataset (classification).

An uplift modeling dataset containing data about Lenta’s customers grociery shopping and related marketing campaigns.

Major columns:

  • group (str): treatment/control group flag

  • response_att (binary): target

  • gender (str): customer gender

  • age (float): customer age

  • main_format (int): store type (1 - grociery store, 0 - superstore)

Read more in the docs.

Parameters
  • data_home (str) – The path to the folder where datasets are stored.

  • dest_subdir (str) – The name of the folder in which the dataset is stored.

  • download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing.

  • return_X_y_t (bool) – If True, returns (data, target, treatment) instead of a Bunch object.

Returns

dataset.

Bunch:

By default dictionary-like object, with the following attributes:

  • data (DataFrame object): Dataset without target and treatment.

  • target (Series object): Column target by values.

  • treatment (Series object): Column treatment by values.

  • DESCR (str): Description of the Lenta dataset.

  • feature_names (list): Names of the features.

  • target_name (str): Name of the target.

  • treatment_name (str): Name of the treatment.

Tuple:

tuple (data, target, treatment) if return_X_y_t is True

Return type

Bunch or tuple

Example:

from sklift.datasets import fetch_lenta


dataset = fetch_lenta()
data, target, treatment = dataset.data, dataset.target, dataset.treatment

# alternative option
data, target, treatment = fetch_lenta(return_X_y_t=True)

See also

fetch_x5(): Load and return the X5 RetailHero dataset (classification).

fetch_criteo(): Load and return the Criteo Uplift Prediction Dataset (classification).

fetch_hillstrom(): Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).

fetch_megafon(): Load and return the MegaFon Uplift Competition dataset (classification).

Lenta Uplift Modeling Dataset

Data description

An uplift modeling dataset containing data about Lenta’s customers grociery shopping and related marketing campaigns.

Source: BigTarget Hackathon hosted by Lenta and Microsoft in summer 2020.

Fields

Major features:

  • group (str): treatment/control group flag

  • response_att (binary): target

  • gender (str): customer gender

  • age (float): customer age

  • main_format (int): store type (1 - grociery store, 0 - superstore)

Feature

Description

CardHolder

customer id

customer

age

children

number of children

cheque_count_[3,6,12]m_g*

number of customer receipts collected within last 3, 6, 12 months before campaign. g* is a product group

crazy_purchases_cheque_count_[1,3,6,12]m

number of customer receipts with items purchased on “crazy” marketing campaign collected within last 1, 3, 6, 12 months before campaign

crazy_purchases_goods_count_[6,12]m

items amount purchased on “crazy” marketing campaign collected within last 6, 12 months before campaign

disc_sum_6m_g34

discount sum for past 6 month on a 34 product group

food_share_[15d,1m]

food share in customer purchases for 15 days, 1 month

gender

customer gender

group

treatment/control group flag

k_var_cheque_[15d,3m]

average check coefficient of variation for 15 days, 3 months

k_var_cheque_category_width_15d

coefficient of variation of the average number of purchased categories (2nd level of the hierarchy) in one receipt for 15 days

k_var_cheque_group_width_15d

coefficient of variation of the average number of purchased groups (1st level of the hierarchy) in one receipt for 15 days

k_var_count_per_cheque_[15d,1m,3m,6m]_g*

unique product id (SKU) coefficient of variation for 15 days, 1, 3 ,6 months for g* product group

k_var_days_between_visits_[15d,1m,3m]

coefficient of variation of the average period between visits for 15 days, 1 month, 3 months

k_var_disc_per_cheque_15d

discount sum coefficient of variation for 15 days

k_var_disc_share_[15d,1m,3m,6m,12m]_g*

discount amount coefficient of variation for 15 days, 1 month, 3 months, 6 months, 12 months for g* product group

k_var_discount_depth_[15d,1m]

discount amount coefficient of variation for 15 days, 1 month

k_var_sku_per_cheque_15d

number of unique product ids (SKU) coefficient of variation for 15 days

k_var_sku_price_12m_g*

price coefficient of variation for 15 days, 3, 6, 12 months for g* product group

main_format

store type (1 - grociery store, 0 - superstore)

mean_discount_depth_15d

mean discount depth for 15 days

months_from_register

number of months from a moment of register

perdelta_days_between_visits_15_30d

timdelta in percent between visits during the first half of the month and visits during second half of the month

promo_share_15d

promo goods share in the customer bucket

response_att

binary target variable = store visit

response_sms

share of customer responses to previous SMS. Response = store visit

response_viber

share of responses to previous Viber messages. Response = store visit

sale_count_[3,6,12]m_g*

number of purchased items from the group * for 3, 6, 12 months

sale_sum_[3,6,12]m_g*

sum of sales from the group * for 3, 6, 12 months

stdev_days_between_visits_15d

coefficient of variation of the days between visits for 15 days

stdev_discount_depth_[15d,1m]

discount sum coefficient of variation for 15 days, 1 month

Key figures

  • Format: CSV

  • Size: 153M (compressed) 567M (uncompressed)

  • Rows: 687,029

  • Response Ratio: .1

  • Treatment Ratio: .75

About Lenta

https://upload.wikimedia.org/wikipedia/commons/7/73/Lenta_logo.svg

Lenta (Russian: Лентa) is a Russian super - and hypermarket chain. With 149 locations across the country, it is one of Russia’s largest retail chains in addition to being the country’s second largest hypermarket chain.

Link to the company’s website: https://www.lenta.com/