Data Driven Business Solutions

We specialise in Data Science and AI solutions for complex engineering and business problems.

  • Rahul Rao

Detecting asset failure in real-time

Updated: Aug 17, 2020

Maintenance strategies are essential for businesses that depend on machinery. Badly-designed strategies lead to machinery failure at unexpected times, making it difficult to plan for downtime or prevent it. Unexpected failure of a machine can lead to lost revenue or worse, injury or death.

A prominent example of rigorous maintenance schedules is in the aircraft industry. After every flight, aircraft are inspected by a team of engineers and technicians before the all-clear is given to take off again. The cost of this maintenance hovered around 40 billion USD between 2002 and 2010 and is forecast to rise significantly in the 2020s. Despite this comprehensive proactive maintenance approach, the annual global cost for unscheduled aircraft maintenance in 2017 was approximately 20 billion USD. This number is forecast to grow to approximately 28 billion USD in 2025 and 40 billion USD in 2035. Clearly any improvement in maintenance strategy is worth large sums of money.

In this article we explore the types of maintenance strategies available to organisations and look at a case study of bearing failure.

Types of maintenance strategies

There are three broad classes of maintenance strategies - reactive, preventative, and predictive. The relative lost revenue and complexity of these three is shown in Figure 1.

Figure 1. Classes of maintenance strategies.

Reactive maintenance

A reactive maintenance strategy is simple - when it breaks, fix it. The benefits of reactive maintenance are obvious - you never perform unnecessary maintenance and the complexity of setting up such a strategy is low. Its drawbacks are equally obvious:

  • Only functional failures are addressed. By the time functional failure has occurred, the failure could have caused damage to other components of the asset, other assets, or individuals working in the vicinity. Equipment like aircraft and bridges cannot be allowed to perform till functional failure.

  • Unplanned downtime is virtually a certainty. Without performing some sort of predictive life-cycles-remaining calculation, failures will not be planned for and spare parts or technical resources may be unavailable. Estimates indicate the cost of unplanned maintenance to be approximately 15 times that of the same maintenance, when scheduled. When added to the lost revenue from downtime, this is a disaster for any streamlined process.

Preventative maintenance

Preventative maintenance applies a little more thought to the maintenance strategy. All assets are prescribed a service interval, which could be in terms of time, distance, or cycles. Two ways of managing the service itself are:

  • Inspect the asset at the service interval. If a manual inspection determines replacement or repair is necessary, replace or repair the appropriate part. The shortcoming of this method is its reliance on the manual input of trained personnel, and the associated cost.

  • Replace the part or asset irrespective of its condition. Given manufacturing tolerances as well as variations in usage patterns, there is likely to be a spread in the condition of the part or asset, so many good examples will be discarded.

Using either of these methods, some fraction of the total downtime can be planned for and resources allocated accordingly, with the corresponding decrease in cost. Preventative maintenance is a very common strategy in heavy machinery industries.

Predictive maintenance

Many industries are now exploring predictive maintenance to address the shortcomings of preventative maintenance. Predictive maintenance aims to use sensors on the asset to predict when failure is likely to occur. Using this method, companies can more confidently determine upcoming downtime and can allocate resources appropriately, while saving on unnecessary replacement of parts. By decreasing both the frequency of unexpected failures and the amount of unnecessary maintenance, the organisation saves significant amounts of money.

The downsides to this type of maintenance are that it is technically complex and is difficult to set up and integrate into systems for spare parts and inventory management. Once running however, the cost of predictive maintenance is limited to the computing power being used which is typically nominal.

Predictive maintenance operates on the principle of a Prediction Window, which is a period of time between the first indications of failure and the complete functional failure of an asset. The concept of a prediction window is depicted in Figure 2. Inside the prediction window, there are signs that the asset is failing which can be measured by sensors. These signs may be missed by human operators as it is often unclear what the failure mechanism is.

Figure 2. The prediction window concept in predictive maintenance.

Several first-principles techniques can be used to model failure of an asset, including fatigue analysis and spectral analysis. The drawbacks with using first-principles are twofold:

  • Such techniques are specific to the asset that the models were built for. They do not generalise well to similar but non-identical assets without some tweaking of model parameters.

  • General wear-and-tear of the asset is difficult to model. Wear-and-tear shows slow drifts in sensor readings that could be mistaken for impending failure by a first-principles model.

Given these limitations of first-principles models, the current state-of-the-art of predictive maintenance is the use of machine learning models. Drawing on our expertise in artificial intelligence and mechanical engineering, Deep Blue AI has developed Apollo, a predictive maintenance framework that uses machine learning to predict asset failure. The continuous learning functionality of Apollo enables two key features that drive down costs and increase prediction confidence:

  • Ability to generalise - Apollo learns the “signature” of normal operation on an asset. When it is applied to a completely different asset, it relearns a new signature before it starts making predictions.

  • Ability to adapt - Apollo monitors the asset continually to recognise wear-and-tear and distinguish expected deterioration from indications of failure.

Case study: Bearing Failure

A study was performed using Apollo to distinguish between a new and a faulty bearing. Data is taken from the Bearing Data Center of Case Western Reserve University. Faults ranging from 0.18mm to 0.71mm were induced on the inner race surface using electro-dicharge machining. The bearing was then installed into a 2hp electric motor instrumented with accelerometers. Measurements from the accelerometers are used as the input into Apollo.

The video below shows the performance of Apollo with these bearings. The top graph indicates Apollo's estimation of bearing state, while the bottom graph shows the ground truth of whether data fed to Apollo is from a good bearing or a faulty bearing.

Initially the motor is running with a good bearing at various loads. At cycle 650, data from the faulty bearings instead is fed into Apollo, as indicated by the bottom graph. The sudden jump in values in the top graph shows where asset failure is flagged.

Coupled with our interactive dashboards, Apollo provides the asset operator and owner with early warning of possible functional failure. The machine is still operable and can be used while plans are made to obtain replacement parts and shift the workload to another machine. Then the machine can be safely shut down and the faulty part replaced, resulting in minimal downtime and increased productivity.

To know if Apollo is right for you, contact us to discuss your specific situation at

81 views0 comments

Recent Posts

See All