Is your data science as good as it could be?
How much of the time do you think your analytical results are even broadly correct?
Data science as if the answers actually mattered?
Why should anyone believe your analytical results?
Test-driven data analysis (TDDA) is an approach to improving the correctness and robustness of analytical processes by transferring the ideas of test-driven development from the arena of software development to the domain of data analysis, extending and adjusting them where appropriate.
TDDA is primarily a methodology that can be implemented in many different ways, but good tool support can facilitate and drive the uptake of TDDA. Stochastic Solutions provides an open-source (MIT-licensed) Python module, tdda, for this purpose.
Reference Tests. Reproducible research emphasises the need to capture executable analytical processes and inputs to allow others to reproduce and verify them. Reference tests build on these ideas by also capturing expected outputs and a verification procedure (a “diff” tool) for validating that the output is as expected. The tdda Python module supports testing using comparisons of complex objects with exclusions and regeneration of verified reference outputs. Constraint Discovery & Verification. There are often things we know should be true of input, output and intermediate datasets, that can be expressed as constraints—allowed ranges of values, uniqueness and existence constraints, allowability of nulls etc. The Python tdda module not only verifies constraints, but generates them from example datasets, thus significantly reducing the effort needed to capture and maintain constraints as processes are used and evolve. Constraints can be thought of as (unit) tests for data.
Getting data analysis right is hard. In addition to all the ordinary problems of software development, with data analysis we often face other challenges, including poorly specified analytical goals problematical input data—poorly specified, missing values, incorrect linkage, outliers, data corruption possibility of misapplying methods problems with interpreting input data and results changes in distributions of inputs, invalidating previous analytical choices.
Lots of people can build customer behaviour models for you, or audit your analytical marketing, or discuss your customer management strategy. Most of them are bigger and better known than Stochastic Solutions. So why us?
What we're best at is aligning all the maths and stats and technologies that businesses use to deliver effective customer management towards the organization's goals. We can engage across the full spectrum, from setting good marketing goals through accurate measurement of success to segmentation, modelling and optimization. In short, we concentrate on asking the right questions. Often, that leads to a change of goal and problem formulation. When it does, sometimes the same methods suffice to tackle the new formulation, and sometimes new or different methods are needed; if they are, we develop or find those.
We have to find a way of making the important measurable, instead of making the measurable important. — Robert Namara
It is axiomatic that marketing activity needed to be systematically measured; the trick is to measure and focus on the right things.
A concrete example of this is marketing models, which almost always focus on the wrong thing. We developed and use uplift modelling for cross-selling and retention. Uplift models allow marketers to focus resources on the people whose behaviour is most positively influenced by the activity, rather than wasting resources on people who are either not affected by or worse, are actually negatively influenced by the intervention.
If this sounds like what ordinary "response models" are supposed to do, that's because the name "response model" is misleading. This is a perfect example of asking a better question, realising that standard methods don't answer it, and then developing new algorithms and measures to allow the better formulation to be tackled.
We feel justified in claiming that, almost uniquely, in the context of direct marketing, we can actually address John Wanamaker's lament:
I know half of my spend on advertizing is
as wasted; the trouble is, I know know which half.
— John Wanamaker
Most retention is implicitly based on the idea that the best people to target are those most likely to leave. This is rather like trying it improve an exam pass rate by directing most attention to the lowest achievers: it may be heroically worthwhile, but it probably isn't the easiest way to achieve the stated goal.
Churn and attrition models prioritize customers whose probability of leaving is highest. Such customers tend to be dissatisfied, so are usually hard to retain. To make matters worse, in many cases, the only thing currently keeping them is inertia, and interventions run a serious risk of back-firing, triggering the very defections they seek to avoid.
It is more profitable to focus retention activity on those people who are easiest to save—those most receptive to our retention programmes. Like focusing effort on students who are otherwise likely narrowly to fail the exam, this is generally the most efficient strategy for improving the measured outcome.
The customers who generate a positive return on investment from retention activity investment are those in red—the people will leave without an intervention, but who can be persuaded to stay. Uplift models allow you to target them, and them alone. At all costs, you want to avoid targeting the group in black, (so-called Sleeping Dogs), whose defection you are likely to trigger by your intervention. Again, uplift models can direct you away from those customers.
In contrast, standard approaches based on churn or attrition scores tend to direct attention towards the wrong groups, including, in many cases, the Sleeping Dogs. Targeting them is a disaster, as the organization actually spends money to drive away business. Even where this is avoided, traditional targeting inevitably focuses attention on customers who are hard to save, while overlooking those who are more receptive.
Stochastic Solutions has unparalled experience in helping companies to build uplift models that predict the incremental impact on retention of targeting each customer. Standard stats packages and methods simply cannot build uplift models, so you need a specialist approach. By using such incremental models, you align your targeting with the outcome that you measure (the net increase in retention achieved by your campaign) and the very metric that determines the value of the retention activity.
Contact Stochastic Solutions on +44 7713 787 602 or at info@StochasticSolutions.com, and let us help you increase sales by targeting the people whose behaviour is actually positively influenced by your marketing.
You probably already use a control group to measure the net impact of your marketing. You do this because you know that some of the people who buy after being exposed to your marketing would have bought anyway. The control group allows you to measure the incremental impact or uplift.
But unless you're very unusual, when choosing who to target, you don't use an incremental approach: you just use a response model, or a propensity model, to try to people who are likely to buy, with no regard to incrementality.
The only prospects that generate a return on marketing investment are those in red—the people who buy only when they receive your marketing. Uplift models allow you to target them, and them alone.
In contrast, standard approaches based on response or propensity models direct the bulk of their effort at those shown in white (people unaffected by the marketing), and possibly even at the group shown in black (people negatively affected by your marketing), while sometimes missing some of the persuadable reds. This is doubly bad, resulting in wasted spend, targeting people who would have bought anyway; and missed opportunities, failing to target people who may not be very likely to buy even if you do target them, but are almost certain not to if you don't.
Stochastic Solutions has unparalled experience in helping companies to build uplift models that predict the incremental impact on sales of targeting each person in your prospect pool. Standard stats packages and methods simply cannot build uplift models, so you need a specialist approach. By using such incremental models, you align your targeting with the outcome that you measure (the lift of your cross-sales campaign) and the very metric that determines the volume of sales you make.
The first thing to understand about randomized (stochastic) search is that it is not the same thing as random search. Not even close.
It is this fundamental confusion that is behind many people's difficulty with the idea that evolution could possibly have produced the richness and sophistication of life we see on Earth. They focus on the "random" nature of mutation and reason that just changing things randomly can't possibly produce a brain, a butterfly, an oak tree or even a single-cell organism. And they're right. It's selection that does the heavy lifting. The random nature of mutation simply provides variation for selection — survival of the fittest — to winnow down. Most mutations are harmful, destroying useful features that have been built up, and most of those that aren't harmful, are neutral, neither improving nor harming the organism. It's the rare few that actually make something better, and it's the role of selection to favour those few. Even then, the process isn't automatic: an organism with an advantageous mutation, axiomatically has a better chance of surviving and reproducing than the same organism that doesn't (because that's how we define selective advantage). But that organism can be unlucky and die young or fail to reproduce. So selection too has a strong random element. However, even a small and probabilistic selective advantage is multiplied exponentially through the generations, with the consequence that improving mutations build up.
Some of the stochastic search methods we use at Stochastic Solutions are directly modelled on natural evolution — techniques such as genetic algorithms, evolution strategies and genetic programming. Others, like simulated annealing, take their inspiration from other natural stochastic processes, such as the way a metal cools.
Our approach to search is informed by the insight that three features are dominant in determining the effectiveness of optimization methods. These are domain knowledge, problem representation and choice of move operators.
It all starts with domain knowledge, because without that stochastic methods are reduced to the very aimless wandering that is evolution's caricature.* So our first step is always to capture what is known about the problem from whatever sources of information are available. This can include interviewing domain experts, studying current and previous approaches, reviewing the literature and, where possible, directly probing or studying whatever system is being optimized.
The domain knowledge then has to be encapsulated in a way that makes it available in a useful form to the search algorithm. This is achieved through a combination of the choice of problem representation (logical, rather than physical, normally) and the move operators to be employed during the search.
Nick Radcliffe, who founded Stochastic Solutions, has worked for many years on the relationship between these three pivotal aspects of search, and has developed, through a series of publications, a solid theory of representation for stochastic search in general, and evolutionary algorithms more particularly, called forma analysis. This is an intensely practical theory that helps move from specific insights about a problem, through a systematic process that aids the production of suitable problem representations and move operators. These can then be used directly, or modified further, using heuristic insights, to produce a sound and effective approach to the problem at hand.
*The careful reader may wonder where natural evolution's ``domain knowledge'' comes from. The difference here arises because our goal is to harness the power of evolution to to a particular end — usually, to optimize a function. In natural evolution, the goal is implicit: it is survival through the generations. It is in bending evolution to our own ends that the requirement for domain knowledge surfaces.
Staff at Stochastic Solutions, have a long history of harnessing and exploiting the power of random variation and using it to solve challenging industrial and commercial problems. We do this by combining strong theoretical and technical knowledge of cutting-edge techniques with ruthlessly practical and pragmatic approaches to exploiting all other information and methods that can help to crack the problem in question. This leads us to favour hybrid approaches, whereby we try to incorporate existing search and optimization approaches into either evaluation functions or move operators. Because stochastic search methods, especially those based on evolutionary paradigms, provide excellent frameworks for this approach, this usually allows us to produce systems that out-perform both the existing approaches and a purer methodology based on a single stochastic search paradigm. We love theory, and admire purity, but in the end we do whatever it takes to get the job done.
Successful applications of this approach by staff at Stochastic Solutions have come in many industrial and commercial settings. One application was optimizing the design of gas pipelines to supply cities. Here, the goal was to minimize the cost of the pipeline while satisfying all engineering and safety constraints. Another was credit scoring, where we produced a hybrid solution that combined best-practice scorecarding with an evolutionary approach that produced a solution better than had previously been believed to be possible. We have also applied these methods successfully in fields as diverse as retail dealership location, oil production scheduling and computational process placement. More recently, we have harnessed the power of stochastic search to optimize the data-preparation phase that typically dominates the time spent in predictive modelling and data mining.
Whatever your requirement for optimization, search, covering, or constraint satisfaction, Stochastic Solutions will work with you to harness modern search methods to solve your problem.