sklift.metrics.uplift_by_percentile
- sklift.metrics.metrics.uplift_by_percentile(y_true, uplift, treatment, strategy='overall', bins=10, std=False, total=False, string_percentiles=True)[source]
Compute metrics: uplift, group size, group response rate, standard deviation at each percentile.
Metrics in columns and percentiles in rows of pandas DataFrame:
n_treatment
,n_control
- group sizes.response_rate_treatment
,response_rate_control
- group response rates.uplift
- treatment response rate substract control response rate.std_treatment
,std_control
- (optional) response rates standard deviation.std_uplift
- (optional) uplift standard deviation.
- Parameters
y_true (1d array-like) – Correct (true) binary target values.
uplift (1d array-like) – Predicted uplift, as returned by a model.
treatment (1d array-like) – Treatment labels.
strategy (string, ['overall', 'by_group']) –
Determines the calculating strategy. Default is ‘overall’.
'overall'
:The first step is taking the first k observations of all test data ordered by uplift prediction (overall both groups - control and treatment) and conversions in treatment and control groups calculated only on them. Then the difference between these conversions is calculated.
'by_group'
:Separately calculates conversions in top k observations in each group (control and treatment) sorted by uplift predictions. Then the difference between these conversions is calculated
std (bool) – If True, add columns with the uplift standard deviation and the response rate standard deviation. Default is False.
total (bool) – If True, add the last row with the total values. Default is False. The total uplift computes as a total response rate treatment - a total response rate control. The total response rate is a response rate on the full data amount.
bins (int) – Determines the number of bins (and the relative percentile) in the data. Default is 10.
string_percentiles (bool) – type of percentiles in the index: float or string. Default is True (string).
- Returns
DataFrame where metrics are by columns and percentiles are by rows.
- Return type
pandas.DataFrame