sklift.datasets.fetch_hillstrom

sklift.datasets.datasets.fetch_hillstrom(target_col='visit', data_home=None, dest_subdir=None, download_if_missing=True, return_X_y_t=False, as_frame=True)[source]

Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).

This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.

Major columns:

  • Visit (binary): target. 1/0 indicator, 1 = Customer visited website in the following two weeks.

  • Conversion (binary): target. 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.

  • Spend (float): target. Actual dollars spent in the following two weeks.

  • Segment (str): treatment. The e-mail campaign the customer received

Read more in the docs.

Parameters
  • target_col (string, 'visit' or 'conversion' or 'spend', default='visit') – Selects which column from dataset will be target

  • data_home (str) – The path to the folder where datasets are stored.

  • dest_subdir (str) – The name of the folder in which the dataset is stored.

  • download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing.

  • return_X_y_t (bool, default=False) – If True, returns (data, target, treatment) instead of a Bunch object.

  • as_frame (bool) – If True, returns a pandas Dataframe for the data, target and treatment objects in the Bunch returned object; Bunch return object will also have a frame member.

Returns

dataset.

Bunch:

By default dictionary-like object, with the following attributes:

  • data (ndarray or DataFrame object): Dataset without target and treatment.

  • target (Series object): Column target by values.

  • treatment (Series object): Column treatment by values.

  • DESCR (str): Description of the Lenta dataset.

  • feature_names (list): Names of the features.

  • target_name (str): Name of the target.

  • treatment_name (str): Name of the treatment.

Tuple:

tuple (data, target, treatment) if return_X_y is True

Return type

Bunch or tuple

References

https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html

Kevin Hillstrom Dataset: MineThatData

Data description

This is a copy of MineThatData E-Mail Analytics And Data Mining Challenge dataset.

This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.

  • 1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.

  • 1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.

  • 1/3 were randomly chosen to not receive an e-mail campaign.

During a period of two weeks following the e-mail campaign, results were tracked. Your job is to tell the world if the Mens or Womens e-mail campaign was successful.

Fields

Historical customer attributes at your disposal include:

  • Recency: Months since last purchase.

  • History_Segment: Categorization of dollars spent in the past year.

  • History: Actual dollar value spent in the past year.

  • Mens: 1/0 indicator, 1 = customer purchased Mens merchandise in the past year.

  • Womens: 1/0 indicator, 1 = customer purchased Womens merchandise in the past year.

  • Zip_Code: Classifies zip code as Urban, Suburban, or Rural.

  • Newbie: 1/0 indicator, 1 = New customer in the past twelve months.

  • Channel: Describes the channels the customer purchased from in the past year.

Another variable describes the e-mail campaign the customer received:

  • Segment

    • Mens E-Mail

    • Womens E-Mail

    • No E-Mail

Finally, we have a series of variables describing activity in the two weeks following delivery of the e-mail campaign:

  • Visit: 1/0 indicator, 1 = Customer visited website in the following two weeks.

  • Conversion: 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.

  • Spend: Actual dollars spent in the following two weeks.