sklift.datasets.fetch_lenta¶
-
sklift.datasets.datasets.
fetch_lenta
(data_home=None, dest_subdir=None, download_if_missing=True, return_X_y_t=False)[source]¶ Load and return the Lenta dataset (classification).
An uplift modeling dataset containing data about Lenta’s customers grociery shopping and related marketing campaigns.
Major columns:
group
(str): treatment/control group flagresponse_att
(binary): targetgender
(str): customer genderage
(float): customer agemain_format
(int): store type (1 - grociery store, 0 - superstore)
Read more in the docs.
- Parameters
data_home (str) – The path to the folder where datasets are stored.
dest_subdir (str) – The name of the folder in which the dataset is stored.
download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing.
return_X_y_t (bool) – If True, returns (data, target, treatment) instead of a Bunch object.
- Returns
dataset.
- Bunch:
By default dictionary-like object, with the following attributes:
data
(DataFrame object): Dataset without target and treatment.target
(Series object): Column target by values.treatment
(Series object): Column treatment by values.DESCR
(str): Description of the Lenta dataset.feature_names
(list): Names of the features.target_name
(str): Name of the target.treatment_name
(str): Name of the treatment.
- Tuple:
tuple (data, target, treatment) if return_X_y is True
- Return type
Bunch or tuple
Lenta Uplift Modeling Dataset¶
Data description¶
An uplift modeling dataset containing data about Lenta’s customers grociery shopping and related marketing campaigns.
Source: BigTarget Hackathon hosted by Lenta and Microsoft in summer 2020.
Fields¶
Major features:
group
(str): treatment/control group flag
response_att
(binary): target
gender
(str): customer gender
age
(float): customer age
main_format
(int): store type (1 - grociery store, 0 - superstore)
Feature |
Description |
---|---|
CardHolder |
customer id |
customer |
age |
children |
number of children |
cheque_count_[3,6,12]m_g* |
number of customer receipts collected within last 3, 6, 12 months before campaign. g* is a product group |
crazy_purchases_cheque_count_[1,3,6,12]m |
number of customer receipts with items purchased on “crazy” marketing campaign collected within last 1, 3, 6, 12 months before campaign |
crazy_purchases_goods_count_[6,12]m |
items amount purchased on “crazy” marketing campaign collected within last 6, 12 months before campaign |
disc_sum_6m_g34 |
discount sum for past 6 month on a 34 product group |
food_share_[15d,1m] |
food share in customer purchases for 15 days, 1 month |
gender |
customer gender |
group |
treatment/control group flag |
k_var_cheque_[15d,3m] |
average check coefficient of variation for 15 days, 3 months |
k_var_cheque_category_width_15d |
coefficient of variation of the average number of purchased categories (2nd level of the hierarchy) in one receipt for 15 days |
k_var_cheque_group_width_15d |
coefficient of variation of the average number of purchased groups (1st level of the hierarchy) in one receipt for 15 days |
k_var_count_per_cheque_[15d,1m,3m,6m]_g* |
unique product id (SKU) coefficient of variation for 15 days, 1, 3 ,6 months for g* product group |
k_var_days_between_visits_[15d,1m,3m] |
coefficient of variation of the average period between visits for 15 days, 1 month, 3 months |
k_var_disc_per_cheque_15d |
discount sum coefficient of variation for 15 days |
k_var_disc_share_[15d,1m,3m,6m,12m]_g* |
discount amount coefficient of variation for 15 days, 1 month, 3 months, 6 months, 12 months for g* product group |
k_var_discount_depth_[15d,1m] |
discount amount coefficient of variation for 15 days, 1 month |
k_var_sku_per_cheque_15d |
number of unique product ids (SKU) coefficient of variation for 15 days |
k_var_sku_price_12m_g* |
price coefficient of variation for 15 days, 3, 6, 12 months for g* product group |
main_format |
store type (1 - grociery store, 0 - superstore) |
mean_discount_depth_15d |
mean discount depth for 15 days |
months_from_register |
number of months from a moment of register |
perdelta_days_between_visits_15_30d |
timdelta in percent between visits during the first half of the month and visits during second half of the month |
promo_share_15d |
promo goods share in the customer bucket |
response_att |
binary target variable = store visit |
response_sms |
share of customer responses to previous SMS. Response = store visit |
response_viber |
share of responses to previous Viber messages. Response = store visit |
sale_count_[3,6,12]m_g* |
number of purchased items from the group * for 3, 6, 12 months |
sale_sum_[3,6,12]m_g* |
sum of sales from the group * for 3, 6, 12 months |
stdev_days_between_visits_15d |
coefficient of variation of the days between visits for 15 days |
stdev_discount_depth_[15d,1m] |
discount sum coefficient of variation for 15 days, 1 month |
Key figures¶
Format: CSV
Size: 153M (compressed) 567M (uncompressed)
Rows: 687 029
Response Ratio: 0.1
Treatment Ratio: 0.75