Back to Blog
CASE STUDY
10 min read
February 15, 2026

PredictML: Predicting Industrial Equipment Failures 72 Hours in Advance

How we built a predictive maintenance system for heavy rotating equipment that detects mechanical faults 72 hours before failure, reducing unplanned downtime by 34% across a fleet of industrial assets.

A critical industrial facility burns $500K+/day of unplanned downtime. At that rate, a model that predicts one failure pays for itself in four hours.

The Cost of Surprise

Heavy rotating equipment is the backbone of many industrial operations. These systems transfer massive amounts of mechanical energy under extreme loads, and when they fail unexpectedly, the entire operation stops. At high-value operating sites, unplanned downtime costs $500K+/day depending on the asset class and the cost of the supporting infrastructure.

Our client operated a fleet of dozens of industrial assets across multiple operating regions. Their critical rotating equipment experienced an average of 3.2 unplanned failures per asset per year, each requiring 2-5 days of repair time. The annual cost of unplanned downtime across the fleet was staggering.

They asked us to build a system that could predict equipment failures before they occurred, giving the maintenance team enough lead time to schedule repairs during planned downtime windows. Even a modest reduction in unplanned failures would generate enormous value.

Understanding the Mechanical System

Before building any models, we spent three weeks with the client's mechanical engineering team understanding how the equipment fails. This domain immersion phase is non-negotiable in industrial ML. You cannot build useful predictive models if you do not understand the physics of the system you are modeling.

Heavy rotating equipment of this class has five primary failure modes:

  1. Main bearing degradation. The main bearing supports extreme loads while rotating at operational speed. Bearing degradation manifests as increasing vibration amplitude at specific frequencies related to the bearing geometry.

  2. Gearbox wear. The transmission gearbox converts motor output to the required torque and speed. Gear tooth wear produces characteristic vibration signatures in the frequency domain.

  3. Motor winding faults. Electrical faults in the drive motors cause current imbalance, increased heat generation, and characteristic patterns in the motor current signature.

  4. Hydraulic system leaks. Hydraulic actuators in the handling and clamping subsystems develop slow leaks that cause gradual pressure drops and increased cycle times.

  5. Seal degradation. Rotating seals that manage fluid flow through the assembly wear over time, causing leaks and eventual seal failure.

Each failure mode has a different time horizon. Bearing degradation develops over weeks to months. Motor winding faults can progress from detectable to critical in 48-72 hours. Hydraulic leaks develop over days. Understanding these timescales is critical for setting prediction horizons.

Data Landscape

The assets were equipped with condition monitoring systems, but the data was a mess. This is universal in industrial ML. The equipment exists. The sensors exist. The data is technically being collected. But it has never been aggregated, cleaned, or analyzed systematically.

Here is what we had to work with:

Vibration data: Triaxial accelerometers on the main bearing housing and gearbox, sampling at 25.6 kHz. Data was stored locally on each site's condition monitoring system as proprietary binary files, with no standardized export format across the fleet.

Motor electrical data: Current and voltage measurements on all three phases of the drive motors, sampled at 10 kHz. Available from the variable frequency drive controllers via Modbus, but only a fraction of assets had the Modbus interface configured for data export.

Operational parameters: Load, RPM, torque, flow rate, and pressure readings. Available from the site's operational data stream at 1 Hz. Reasonably consistent across assets.

Maintenance records: Work orders, part replacements, failure reports. In a mix of SAP entries, Excel spreadsheets, and PDF field reports. Highly inconsistent. Some sites had detailed failure mode classifications. Others had entries like "fixed equipment" with no further detail.

Environmental data: Ambient temperature, humidity, and site-specific environmental conditions. Available from on-site monitoring systems.

The first four months of the project were spent on data engineering: building ETL pipelines for each data source, standardizing formats, aligning timestamps (which were in different time zones and some had clock drift), and manually labeling historical failures by going through maintenance records with mechanical engineers.

Feature Engineering

Feature engineering for vibration-based predictive maintenance is a well-studied domain, but the details matter enormously. We extracted features at multiple time scales:

Time-domain vibration features (computed every 10 minutes):

  • RMS amplitude on each axis
  • Peak-to-peak amplitude
  • Crest factor (peak/RMS ratio, indicates impulsive events)
  • Kurtosis (indicates bearing damage, healthy bearings have kurtosis near 3, damaged bearings exceed 5)
  • Skewness

Frequency-domain vibration features (computed every 10 minutes):

  • Spectral energy in bands corresponding to bearing fault frequencies (BPFO, BPFI, BSF, FTF, computed from bearing geometry)
  • Gear mesh frequency amplitude and harmonics
  • Spectral kurtosis in 1/3-octave bands
  • Overall vibration velocity in the 10-1000 Hz band (ISO 10816 standard)

Motor current features (computed every minute on equipped assets):

  • Current imbalance between phases (percent deviation from mean)
  • Total harmonic distortion
  • Motor current signature analysis (MCSA) features: amplitude at eccentricity-related frequencies
  • Power factor

Operational context features:

  • Current RPM and torque normalized by load
  • Operating mode classification (active operation, transitional states, idle)
  • Cumulative operating hours since last maintenance
  • Cumulative energy throughput (integral of torque times RPM)

Derived trend features:

  • 7-day rolling statistics (mean, std, slope) of all base features
  • Rate of change of vibration features normalized by operating intensity
  • Deviation from asset-specific baseline established during known-healthy operation

The total feature vector was 287 dimensions per 10-minute observation. We computed these features on site-side edge computers and transmitted the feature vectors to a centralized analytics platform via network link, rather than transmitting raw sensor data, which would have been prohibitively expensive given bandwidth constraints.

Model Architecture

We evaluated multiple approaches:

Random Forest baseline. Simple, interpretable, and a strong baseline for tabular data. Achieved 71% recall for 72-hour-ahead failure prediction with a 5% false positive rate on our validation set.

Gradient Boosted Trees (XGBoost). Better performance at 78% recall, 4.2% false positive rate. Still a reasonable model for deployment.

LSTM on time series features. 82% recall, 3.8% false positive rate. Better at capturing temporal degradation patterns. But more complex to deploy and interpret.

Temporal Convolutional Network (TCN). 84% recall, 3.5% false positive rate. Our best performer. The dilated causal convolutions capture multi-scale temporal patterns efficiently, and the model is parallelizable during training (unlike LSTMs).

We deployed the TCN as our primary model with the XGBoost model as a fallback for assets where the TCN's input requirements (continuous feature history) could not be met due to data gaps.

The TCN architecture:

  • Input: 287-dimensional feature vectors at 10-minute intervals, 1008 time steps (7 days of history)
  • 6 residual blocks with dilation factors [1, 2, 4, 8, 16, 32]
  • 128 filters per layer, kernel size 3
  • Output: probability of failure within 24, 48, and 72-hour windows for each of the 5 failure modes
  • Total parameters: 2.1M

We trained separate models for each failure mode rather than a single multi-task model, because the failure modes have different temporal signatures and class imbalance ratios. Bearing failures develop slowly and are relatively common. Motor winding faults develop quickly and are rare. A single model struggles to handle both patterns optimally.

The False Positive Problem

In predictive maintenance, false positives are expensive. A false positive triggers an unnecessary maintenance intervention, which at a remote operating site means mobilizing a service crew, acquiring parts, and potentially stopping operations for inspection. A single false positive can cost tens of thousands to over a hundred thousand dollars.

We tuned our models aggressively for precision at the expense of recall. A missed prediction (false negative) is bad, but it is the status quo. The fleet was already experiencing unplanned failures. A false positive makes the system actively harmful.

Our operating point was calibrated to achieve a precision of 85% at 72-hour lead time, meaning 85% of alerts correspond to genuine developing faults. This required setting the classification threshold higher than the point of maximum F1 score, accepting lower recall in exchange for operational trust.

Trust is the critical factor. If operators receive too many false alarms, they stop responding to alerts entirely. We had seen this pattern at other clients. The system that cries wolf gets its alerts routed to a folder nobody checks.

Deployment Architecture

The deployment architecture spans edge and cloud:

Edge (on each site):

  • Feature extraction pipeline running on a ruggedized industrial PC
  • Data quality validation and gap detection
  • Feature vector transmission every 10 minutes
  • Local alert generation for the XGBoost fallback model (in case of network outage)

Cloud (centralized):

  • Feature storage in TimescaleDB
  • TCN inference service running on CPU (the model is small enough that GPU is unnecessary)
  • Alert management service with deduplication, escalation, and acknowledgment tracking
  • Dashboard for fleet-wide health visualization
  • Model retraining pipeline triggered monthly or on demand

Alerts are delivered through the client's existing operational communication channels: email to the centralized maintenance planning team, and a notification on the site's condition monitoring display for the on-site mechanic.

Results After 12 Months

After 12 months of production operation across the majority of the fleet (some assets were excluded due to insufficient sensor infrastructure):

| Metric | Result | |---|---| | Total alerts generated | 247 | | True positive alerts | 198 (80.2% precision) | | False positive alerts | 49 | | Detected failures (72h lead time) | 198 of 267 total failures (74.2% recall) | | Undetected failures | 69 | | Reduction in unplanned downtime | 34% fleet-wide | | System operational uptime | 99.4% |

The precision was slightly below our target of 85%, which we attributed to two factors: assets with older sensor installations producing noisier data, and a class of intermittent faults that appear to develop and then self-resolve before reaching failure. We are investigating the latter phenomenon with the client's engineering team.

The 34% reduction in unplanned downtime exceeded the client's target of 25%. The remaining undetected failures were predominantly in failure modes with rapid onset (motor winding faults that developed in under 24 hours) and failures in subsystems without adequate sensor coverage.

What We Would Do Differently

Invest more in sensor standardization upfront. The heterogeneity of sensor configurations across the fleet consumed 40% of the project timeline. If we were starting over, we would negotiate sensor standardization as a prerequisite before beginning model development.

Deploy simpler models first. Our XGBoost baseline was good enough to deliver significant value, and it could have been deployed months earlier rather than waiting for the TCN. Shipping a 71% recall model early and iterating is better than waiting for an 84% recall model.

Build the trust framework before the model. We underestimated how much effort would be needed to get site-based maintenance teams to trust and act on model alerts. The human change management was harder than the machine learning.

Predictive maintenance is not a sexy ML problem. There are no transformers, no generative models, no viral demos. But when a single predicted failure saves hundreds of thousands of dollars in avoided downtime, the ROI conversation is very short.

Discussion (2)

DO
ops_director_miningDirector of Operations · Mining3 weeks ago

We have similar problems with our crushing equipment — extremely rare failure events (<0.5% of data), harsh environment (dust, vibration, temperature extremes), and the cost of unexpected failure is $800K+ per incident. The class imbalance approach you described is exactly what we've been struggling with. Every vendor we've talked to just throws SMOTE at it and calls it a day.

M
Mostafa DhouibAuthor3 weeks ago

SMOTE on time-series sensor data is almost always the wrong approach — it creates synthetic samples that don't respect the temporal dynamics of your system. For rare failure events in mining equipment, you want a combination of: (1) focal loss to force the model to focus on the minority class, (2) windowed feature engineering that captures the degradation signature BEFORE failure (vibration harmonics, temperature trends, pressure deviations), and (3) an ensemble that separates 'normal operating variation' from 'early failure signature.' The 72-hour advance warning we achieved in oil & gas came from identifying these degradation patterns, not from oversampling.

M
Mostafa DhouibFounder & ML Engineer at Opulion

Facing a similar challenge?

Tell us about your problem. We'll respond with an honest technical assessment within 24 hours.