sklift.datasets.fetch_hillstrom¶
-
sklift.datasets.datasets.
fetch_hillstrom
(target_col='visit', data_home=None, dest_subdir=None, download_if_missing=True, return_X_y_t=False, as_frame=True)[source]¶ Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).
This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.
Major columns:
Visit
(binary): target. 1/0 indicator, 1 = Customer visited website in the following two weeks.Conversion
(binary): target. 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.Spend
(float): target. Actual dollars spent in the following two weeks.Segment
(str): treatment. The e-mail campaign the customer received
Read more in the docs.
- Parameters
target_col (string, 'visit' or 'conversion' or 'spend', default='visit') – Selects which column from dataset will be target
data_home (str) – The path to the folder where datasets are stored.
dest_subdir (str) – The name of the folder in which the dataset is stored.
download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing.
return_X_y_t (bool, default=False) – If True, returns (data, target, treatment) instead of a Bunch object.
as_frame (bool) – If True, returns a pandas Dataframe for the data, target and treatment objects in the Bunch returned object; Bunch return object will also have a frame member.
- Returns
dataset.
- Bunch:
By default dictionary-like object, with the following attributes:
data
(ndarray or DataFrame object): Dataset without target and treatment.target
(Series object): Column target by values.treatment
(Series object): Column treatment by values.DESCR
(str): Description of the Lenta dataset.feature_names
(list): Names of the features.target_name
(str): Name of the target.treatment_name
(str): Name of the treatment.
- Tuple:
tuple (data, target, treatment) if return_X_y is True
- Return type
Bunch or tuple
References
https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html
Kevin Hillstrom Dataset: MineThatData¶
Data description¶
This is a copy of MineThatData E-Mail Analytics And Data Mining Challenge dataset.
This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.
1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.
1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.
1/3 were randomly chosen to not receive an e-mail campaign.
During a period of two weeks following the e-mail campaign, results were tracked. Your job is to tell the world if the Mens or Womens e-mail campaign was successful.
Fields¶
Historical customer attributes at your disposal include:
Recency: Months since last purchase.
History_Segment: Categorization of dollars spent in the past year.
History: Actual dollar value spent in the past year.
Mens: 1/0 indicator, 1 = customer purchased Mens merchandise in the past year.
Womens: 1/0 indicator, 1 = customer purchased Womens merchandise in the past year.
Zip_Code: Classifies zip code as Urban, Suburban, or Rural.
Newbie: 1/0 indicator, 1 = New customer in the past twelve months.
Channel: Describes the channels the customer purchased from in the past year.
Another variable describes the e-mail campaign the customer received:
Segment
Mens E-Mail
Womens E-Mail
No E-Mail
Finally, we have a series of variables describing activity in the two weeks following delivery of the e-mail campaign:
Visit: 1/0 indicator, 1 = Customer visited website in the following two weeks.
Conversion: 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.
Spend: Actual dollars spent in the following two weeks.