Kevin Hillstrom, through his
MineThatData blog,
made available a
dataset describing two email campaigns and a control group and issued
a challenge to analyse that dataset and answer various questions.
This paper uses Uplift Modelling to take up (and, in fact, win)
Hillstrom’s challenge. In the paper, we look at three different
formulations of the problem and use Uplift Models to analyse each of
the campaigns using those formulations. Broadly, our conclusions are
that both campaigns had a positive impact overall and that both can be
modelled successfully, allowing us to identify subpopulations
particularly suitable and unsuitable for these campaigns. We also
identified some segments for which the Women’s mailing, in particular,
appeared to reduce rather than increase spending; such effects were
less prominent with the Men’s campaign. We also observed that while
average spend among purchasers increased significantly with the
Women’s campaign, it declined slightly for recipients of the Men’s
campaign. Furthermore, an extremely small number of people (of the
order of 50) were responsible for over half of the incremental spend,
making modelling challenging. Indeed, the problem in tackling this
entire analysis was the difficulty of estimating most statistics
reliably given the relatively small samples available and the low
purchase rate. Despite these obstacles, by using fairly simple models
and a variety of methods for controlling noise, we believe that the
insights we present are fairly robust.