Causal Inference: Basics
In a perfect world, we want to calculate a difference in a person’s reaction received communication, and the reaction without receiving any communication. But there is a problem: we can not make a communication (send an e-mail) and do not make a communication (no e-mail) at the same time.
Denoting \(Y_i^1\) person \(i\)’s outcome when receives the treatment (a presence of the communication) and \(Y_i^0\) \(i\)’s outcome when he receives no treatment (control, no communication), the causal effect \(\tau_i\) of the treatment vis-a-vis no treatment is given by:
Researchers are typically interested in estimating the Conditional Average Treatment Effect (CATE), that is, the expected causal effect of the treatment for a subgroup in the population:
Where \(X_i\) - features vector describing \(i\)-th person.
We can observe neither causal effect nor CATE for the \(i\)-th object, and, accordingly, we can’t optimize it. But we can estimate CATE or uplift of an object:
Where:
\(W_i \in {0, 1}\) - a binary variable: 1 if person \(i\) receives the treatment group, and 0 if person \(i\) receives no treatment control group;
\(Y_i\) - person \(i\)’s observed outcome, which is equal:
This won’t identify the CATE unless one is willing to assume that \(W_i\) is independent of \(Y_i^1\) and \(Y_i^0\) conditional on \(X_i\). This assumption is the so-called Unconfoundedness Assumption or the Conditional Independence Assumption (CIA) found in the social sciences and medical literature. This assumption holds true when treatment assignment is random conditional on \(X_i\). Briefly, this can be written as:
Also, introduce additional useful notation. Let us define the propensity score, \(p(X_i) = P(W_i = 1| X_i)\), i.e. the probability of treatment given \(X_i\).
References
1️⃣ Gutierrez, P., & Gérardy, J. Y. (2017). Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1-13).