# Causal Inference: Basics

In a perfect world, we want to calculate a difference in a person’s reaction received communication, and the reaction without receiving any communication. But there is a problem: we can not make a communication (send an e-mail) and do not make a communication (no e-mail) at the same time.

Denoting \(Y_i^1\) person \(i\)’s outcome when receives the treatment (a presence of the communication) and \(Y_i^0\) \(i\)’s outcome when he receives no treatment (control, no communication), the causal effect \(\tau_i\) of the treatment *vis-a-vis* no treatment is given by:

Researchers are typically interested in estimating the Conditional Average Treatment Effect (CATE), that is, the expected causal effect of the treatment for a subgroup in the population:

Where \(X_i\) - features vector describing \(i\)-th person.

We can observe neither causal effect nor CATE for the \(i\)-th object, and, accordingly, we can’t optimize it.
But we can estimate CATE or *uplift* of an object:

Where:

\(W_i \in {0, 1}\) - a binary variable: 1 if person \(i\) receives the treatment group, and 0 if person \(i\) receives no treatment control group;

\(Y_i\) - person \(i\)’s observed outcome, which is equal:

This won’t identify the CATE unless one is willing to assume that \(W_i\) is independent of \(Y_i^1\) and \(Y_i^0\) conditional on \(X_i\). This assumption is the so-called *Unconfoundedness Assumption* or the *Conditional Independence Assumption* (CIA) found in the social sciences and medical literature.
This assumption holds true when treatment assignment is random conditional on \(X_i\).
Briefly, this can be written as:

Also, introduce additional useful notation. Let us define the propensity score, \(p(X_i) = P(W_i = 1| X_i)\), i.e. the probability of treatment given \(X_i\).

## References

1️⃣ Gutierrez, P., & Gérardy, J. Y. (2017). Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1-13).