Miró is our integrated analytical tool covering data
extraction, manipulation, exploration, reporting, prediction
(including uplift modelling),
and test-driven data analysis.
It features a web-based interface for mixed text and graphical
output, as well as off-line script execution, and a Python API.
Almost every data science project begins with an exploratory phase in
which the analyst learns about the data and tests ideas, usually using
a mixture of ad hoc querying and aggregation, visualization,
filtering, profiling, segmentation, deriving new fields and so
forth. Miró is particularly well-suited to this phase, and enhances
its utility by keeping an executable audit trail of what has been
done, allowing this initial analysis to be efficiently translated into
a more production-ready phase.
Miró implements production-oriented analytics,
meaning that it focuses on allowing analysts to get results
as quickly and painlessly as possible,
from data import to production-ready or near-production-ready
Its Unix-style command-line interface is normally accessed
through a web browser, allowing rich text and graphical output,
but is also fully functional through plain-text terminal,
locally or on a remote server.
Miró generates high-quality, sometimes graphical output,
drawing inspiration from Edward
minimizing chart junk and maximizing meaningful information content.
It also has the ability to produce animated output,
HTML reports, PDF reports (though LaTeX source generation), text files,
Excel spreadsheets and to write directly to database tables.
Miró includes all the functionality from our open-source TDDA library
for test-driven data analysis,
together with various enhancements including
- constraint generation in the presence of bad data
- support for between-field constraints
- relational integrity constraints
- integrated reporting and history tracking and associated
- automatic data content classification, including detection
of personally identifiable information (PII data)
- construction of synthetic data matching many of the key
characteristics of the original data, using a small,
auditable, checkable specification that guards against
leakage of original data into synthetic data
Miró reads and writes the same TDDA files as the open-source version,
allowing the two to be mixed, but gives a more seamless, polished, supported
experience compared with the open-source package.
In addition to standard predictive modelling approaches, Miró
incorporates uplift modelling as a core analytical capability. Uplift
models are used to analyse marketing campaigns in which a randomized
control group has been kept. Uplift trees model the difference in
behaviour between the members of the treated and control populations,
helping marketers to understand which actions are effective for which
segments and (equally importantly) which actions have negative effects
on other segments. This is extremely powerful in the context of
customer retention and sales campaigns such as cross-selling,
up-selling and deep-selling. Miró not only features integrated
significance-based uplift trees, but also a suite of support tools
for operations including
- assessment of uplift models
- variable selection for uplift models
- noise reduction methods particularly suited to uplift modelling
- uplift crosstabs
- split validity assessment
- uplift gains charts and uplift gains tables
- scoring using uplift models.
Once an analytical process has been developed using Miró, it is
extremely simple to turn it into a web app with an arbitrary user
interface. Miró can present any input parameters to a user, run
analytical processes, and present the output, all through a standard
web browser. There are then layers of customization that can easily
be performed to take more control over the input controls, the output
layout, the styling etc. through a combination of writing HTML
Miró provides multiple interfaces, including a programmatic
interface (an API), a command-line/scripting interface
and interactive web access.
The API layer makes it a powerful base for
embedded analytical applications.
Miró also includes a very powerful expression
language for data manipulation.
Miró datasets contain an audit trail showing the sequence of
operations that resulted in any final dataset, allowing diagnosis of
problems and tracking of data provenance. It also allows the full
history of datasets to be reliably traced, even when they may have
been worked on across multiple sessions, perhaps on multiple machines,
by multiple people.
Scripting by Doing
Miró automatically generates detailed logs providing not only
a further audit trail, but also the ability to rerun
analysis sessions, either verbatim or with specified
It logs both command sequences and output (in multiple forms)
meaning that work is never accidentally lost, results can always
be traced in ad hoc analyses can
always be repeated or turned into re-usable scripts.
Miró is cross-platform (across Unix, Linux, Mac and Windows)
with a focus on standards compliance.
Native and Database Back Ends
All Miró functionality is available using its
in which data is stored in its own column-oriented data store
and all manipulations are performed directly by Miró code.
This is suitable for interactive use and batch use
A significant subset of Miró's functionality is also available using a
database back end. In this mode, Miró connects to a database and
collects metadata, but does not extract the main data from
tables. Rather, Miró issues SQL (and in some cases calls in-database
functions) to perform equivalent operations. Depending on the relative
power and capacity of the machine running Miró and the database
hardware, as well as data volume and the nature of the operations
being performed, this can sometimes be faster and sometimes slower
than extracting the data into Miró, performing whatever analysis is
required, and writing any results back.
The level of support varies across database systems, but includes
Postgres, Greenplum, MySQL, SQLite and MongoDB
This approach also allows analytical workflows to be developed in one mode
(most commonly using the native back end) and then deployed, with minimal
or no changes, using a database. This is a popular development-production
split for some clients.