# Causal Inference: Basics

In a perfect world, we want to calculate a difference in a person’s reaction received communication, and the reaction without receiving any communication. But there is a problem: we can not make a communication (send an e-mail) and do not make a communication (no e-mail) at the same time. Denoting $$Y_i^1$$ person $$i$$’s outcome when receives the treatment (a presence of the communication) and $$Y_i^0$$ $$i$$’s outcome when he receives no treatment (control, no communication), the causal effect $$\tau_i$$ of the treatment vis-a-vis no treatment is given by:

$\tau_i = Y_i^1 - Y_i^0$

Researchers are typically interested in estimating the Conditional Average Treatment Effect (CATE), that is, the expected causal effect of the treatment for a subgroup in the population:

$CATE = E[Y_i^1 \vert X_i] - E[Y_i^0 \vert X_i]$

Where $$X_i$$ - features vector describing $$i$$-th person.

We can observe neither causal effect nor CATE for the $$i$$-th object, and, accordingly, we can’t optimize it. But we can estimate CATE or uplift of an object:

$\textbf{uplift} = \widehat{CATE} = E[Y_i \vert X_i = x, W_i = 1] - E[Y_i \vert X_i = x, W_i = 0]$

Where:

• $$W_i \in {0, 1}$$ - a binary variable: 1 if person $$i$$ receives the treatment group, and 0 if person $$i$$ receives no treatment control group;

• $$Y_i$$ - person $$i$$’s observed outcome, which is equal:

$\begin{split}Y_i = W_i * Y_i^1 + (1 - W_i) * Y_i^0 = \ \begin{cases} Y_i^1, & \mbox{if } W_i = 1 \\ Y_i^0, & \mbox{if } W_i = 0 \\ \end{cases}\end{split}$

This won’t identify the CATE unless one is willing to assume that $$W_i$$ is independent of $$Y_i^1$$ and $$Y_i^0$$ conditional on $$X_i$$. This assumption is the so-called Unconfoundedness Assumption or the Conditional Independence Assumption (CIA) found in the social sciences and medical literature. This assumption holds true when treatment assignment is random conditional on $$X_i$$. Briefly, this can be written as:

$CIA : \{Y_i^0, Y_i^1\} \perp \!\!\! \perp W_i \vert X_i$

Also, introduce additional useful notation. Let us define the propensity score, $$p(X_i) = P(W_i = 1| X_i)$$, i.e. the probability of treatment given $$X_i$$.