Data science and us - part III
Data science seems to have become a catch-all phrase for what is actually many kinds of skills - some related and some disparate. Individual skills all have values but when combined, they can be more than the sum of their parts. Properly applied, with an understanding of the entirety of the problem, data science can help you make decisions to achieve the outcomes you want. In this post, we attempt to break down the field as we see it, define the three broad areas of expertise in it, and explain how we're different to other data science ventures.
This is the final edition of a three-part series where we look at the workflow of a typical data science project and its reliance on all three core areas.
Anatomy of a data science project
There are several reasons for which a company might want to use data to drive its decisions, summarised neatly by Gartner's analytic value escalator below.
The value added by the project increases as we move from hindsight to foresight, but so does the difficulty.
Descriptive analytics are normally performed by real-time visualisation or reporting systems. These enable a company to monitor the pulse of its assets near-instantaneously.
Diagnostic analytics are applied after an asset failure has occurred. Data mining and statistical correlations are used to determine reasons for failure after the fact.
Predictive analytics use historical data and machine learning to predict asset performance and likelihood of failure. Predictive models work closely with prescriptive analytics to determine what actions must be taken to maximise performance and/or minimise the chances of failure.
The definition of both assets and performance/failure are loose - they could stand for a number of things. Assets might include heavy equipment, whole factories, customers, products, or services. The relevant performance/failure indicators could be machinery breakdown, factory efficiency, customer retention, product demand, or service costs.
A typical workflow for a data science project that seeks to visualise real-time asset status, diagnose failures, and predict future asset performance and failures is shown below.
The interdependence of the three pillars - data engineering, statistical expertise, and domain knowledge - are clear. Constant liaison and clear communication are required between all three pillars to ensure:
The data available is sufficient to ensure project deliverables are met.
Visualisations show the relevant data processed in the appropriate manner.
Alerts are delivered on the correct triggers and to the correct channels.
Model selection and input features are appropriate.
Model performance meets project requirements.
Model deployment integrates seamlessly into existing business workflows.
In an ideal world, a single person would be capable of every task in the workflow; in practice, this is rare to find. A single data scientist that is capable of performing all three roles of data engineer, domain expert, and statistician is known as a unicorn for precisely this reason. The project team must therefore consist of two or more people who must be in constant communication to ensure the interfaces between the three pillars are managed smoothly.
Deep Blue AI
At Deep Blue AI, we bring all three pillars of expertise to your engineering, technology, or business problem.
With our combined 35+ years of expertise in Data Science, Core Engineering, and Technology, we have the domain knowledge to understand your problem and translate it to a data engineering and data science problem statement. Our consultants in other fields lend us their expertise when needed. Most of all, we listen to you - the subject matter expert in your own niche.
Our team is adept at managing data streams and stores, at the security level you need and with the service level agreements you want. Whether your data is personal, financial, or open-source, we can tailor our solution to meet your needs.
With our quantitative backgrounds, we can subject your data to rigorous statistical analysis to ensure you don't chase false leads or miss important trends. We can also build sophisticated machine learning models to predict future events - machine failure, consumer demand, or cloud costs - so you can plan for them and stay ahead of the curve.
Equally importantly, we are a small group of people who have known each other for a long time. This means the barriers between data engineering, statistical expertise, and domain knowledge have been broken down and that those interfaces are seamless. Our solutions are therefore completely integrated; each piece of the puzzle has been meticulously designed to work with its neighbours.
If you have a persistent problem with your business and you have data pertaining to it, we would love to hear from you. Contact us at firstname.lastname@example.org