Back to Blog
CASE STUDY
9 min read
December 5, 2025

From 130 Devices to Real-Time Fleet Intelligence

How we built a real-time fleet monitoring system that processes telemetry from 130 edge devices, predicts maintenance failures 72 hours in advance, and reduced unplanned downtime by 43%. A deep dive into the architecture.

The hard part of fleet intelligence is not the ML. It is getting reliable data from 130 devices that each have their own opinion about timestamps, connectivity, and what constitutes a valid reading.

The Client Problem

A logistics company operating a fleet of 130 refrigerated transport vehicles came to us with a straightforward-sounding request: predict equipment failures before they happen. Their refrigeration units were failing in the field, spoiling temperature-sensitive cargo. Each failure cost between $15,000 and $80,000 depending on the cargo, not counting the downstream supply chain disruption, customer penalties, and emergency repair dispatch costs.

They had sensors on every unit. Temperature probes, compressor current sensors, condenser pressure gauges, ambient temperature readings, door open/close events, GPS, and accelerometer data. Each vehicle was generating roughly 2MB of telemetry per hour across 14 sensor channels. The data was being collected but nobody was doing anything useful with it. It sat in a data lake, aging into irrelevance.

The ask was simple. The execution was not.

What 130 Devices Actually Looks Like

The first thing you learn when working with fleet telemetry is that no two devices behave the same way, even when they are the same make, model, and firmware version.

We spent the first three weeks just understanding the data. Here is what we found:

Clock drift. Despite NTP synchronization, device clocks drifted by up to 45 seconds over a 24-hour period. When connectivity dropped, the drift compounded. We found devices reporting timestamps 12 minutes in the future relative to the ingestion server. Any time-series analysis that assumed synchronized clocks would produce garbage.

Connectivity gaps. The vehicles operated across a region that included rural areas with spotty cellular coverage. On average, each device experienced 3.2 connectivity gaps per day, ranging from 30 seconds to 4 hours. During gaps, devices buffered data locally and uploaded in bursts when connectivity returned. This meant our ingestion pipeline received data out of order, in variable-sized batches, with unpredictable delays.

Sensor calibration variance. Temperature sensors across the fleet showed a calibration spread of plus or minus 1.8 degrees Celsius. Two identical sensors on the same unit could disagree by 1.2 degrees. This is within spec for industrial temperature sensors, but it means raw sensor values are not directly comparable across vehicles without per-device calibration offsets.

Firmware heterogeneity. The fleet had three different firmware versions deployed simultaneously. Each version reported a slightly different telemetry schema. Version 2.1 reported compressor current as a 16-bit integer in milliamps. Version 2.3 reported it as a 32-bit float in amps. Version 2.4 added two new sensor channels that did not exist in previous versions. Our ingestion pipeline had to handle all three.

The Architecture We Built

After the data audit, we designed a four-layer architecture:

Layer 1: Edge Normalization

Rather than trying to handle all the device heterogeneity in the cloud, we pushed normalization logic to the edge. Each device ran a lightweight normalization agent (written in Rust, compiled to ARM, under 2MB binary) that:

  • Applied per-device calibration offsets loaded from a configuration file
  • Normalized all telemetry to a canonical schema regardless of firmware version
  • Timestamped events using a monotonic clock with periodic NTP correction
  • Buffered data during connectivity gaps with LRU eviction when the buffer exceeded 50MB
  • Compressed and batched uploads using Protocol Buffers

The edge agent reduced our cloud-side complexity enormously. Instead of handling three firmware schemas and per-device calibration in the ingestion pipeline, we received a single normalized schema from every device.

message TelemetryBatch {
    string device_id = 1;
    string firmware_version = 2;
    int64 batch_sequence = 3;
    repeated TelemetryEvent events = 4;
}

message TelemetryEvent {
    int64 timestamp_ms = 1;
    float compressor_current_amps = 2;
    float condenser_pressure_psi = 3;
    float cargo_temp_celsius = 4;
    float ambient_temp_celsius = 5;
    float evaporator_temp_celsius = 6;
    bool door_open = 7;
    float latitude = 8;
    float longitude = 9;
    float vibration_g = 10;
}

Layer 2: Ingestion and Deduplication

The ingestion layer received batches from devices over MQTT, deduplicated them using the device_id and batch_sequence number, and wrote them to a time-series database (TimescaleDB). Deduplication was critical because devices would retry uploads on connectivity restoration, sometimes sending the same batch multiple times.

The deduplication logic was simple but effective:

class BatchDeduplicator:
    def __init__(self, redis_client):
        self.redis = redis_client
        self.ttl_seconds = 86400 * 7  # 7 day TTL

    def is_duplicate(self, device_id: str, batch_sequence: int) -> bool:
        key = f"batch:{device_id}:{batch_sequence}"
        result = self.redis.set(key, 1, nx=True, ex=self.ttl_seconds)
        return result is None  # Returns None if key already existed

    def process_batch(self, batch: TelemetryBatch):
        if self.is_duplicate(batch.device_id, batch.batch_sequence):
            metrics.increment("batches_deduplicated")
            return

        events = self.normalize_timestamps(batch)
        self.write_to_timescale(batch.device_id, events)
        metrics.increment("batches_processed")

We processed an average of 18,000 telemetry events per minute across the fleet, with spikes to 45,000 during morning fleet activation when all vehicles came online simultaneously and uploaded overnight buffered data.

Layer 3: Feature Engineering

The raw telemetry was not directly useful for predictive maintenance. A single temperature reading tells you almost nothing. The predictive signal lives in temporal patterns, rate-of-change features, and cross-sensor correlations.

We engineered 47 features across four categories:

Rolling statistics: Mean, standard deviation, min, max, and slope of each sensor channel over 1-hour, 4-hour, 12-hour, and 24-hour windows.

Rate-of-change features: First and second derivatives of temperature and pressure, computed using Savitzky-Golay filtering to smooth sensor noise.

Cross-sensor correlations: The ratio of compressor current to cooling rate. In a healthy system, this ratio is stable. When a refrigerant leak develops, the compressor works harder (higher current) while cooling efficiency drops (slower temperature reduction). The divergence of this ratio was our single strongest predictive feature.

Duty cycle features: What percentage of time the compressor was running in each window. How many on/off cycles per hour. How long was the average run cycle. Compressor systems approaching failure often show shorter, more frequent cycling as they struggle to maintain setpoint.

def compute_compressor_efficiency_ratio(
    compressor_current: pd.Series,
    cargo_temp: pd.Series,
    ambient_temp: pd.Series,
    window: str = "4h"
) -> pd.Series:
    """
    Ratio of compressor energy input to cooling achieved.
    Rising ratio indicates degrading efficiency, often
    the earliest signal of refrigerant leak or compressor wear.
    """
    # Rolling mean of compressor current (energy input proxy)
    current_rolling = compressor_current.rolling(window).mean()

    # Cooling delta: how much colder is cargo than ambient
    cooling_delta = ambient_temp - cargo_temp

    # Avoid division by zero when temps are equal
    cooling_delta = cooling_delta.clip(lower=0.1)

    # Efficiency ratio: lower is better
    ratio = current_rolling / cooling_delta.rolling(window).mean()

    return ratio

Feature computation ran on a 15-minute schedule using Dagster, processing the latest telemetry window plus a 2-hour lookback for late-arriving data. Features were written to a PostgreSQL feature store with device_id and timestamp as the composite key.

Layer 4: Predictive Models

We trained separate models for three failure modes:

Refrigerant leak detection. A gradient-boosted classifier (XGBoost) trained on the compressor efficiency ratio features, condenser pressure trends, and duty cycle patterns. This model achieved 89% precision at 76% recall with a 72-hour prediction horizon, meaning it correctly identified 76% of refrigerant leaks at least 72 hours before they became critical, with only 11% false positive rate.

Compressor mechanical failure. A 1D convolutional neural network operating on raw vibration spectrograms. Mechanical failures produce distinctive frequency patterns in the vibration data that are difficult to capture with hand-engineered features. The CNN processed 30-second vibration windows and classified them into healthy, degrading, and critical categories. Accuracy was 91% on the test set.

Electrical system faults. A simpler logistic regression model on voltage stability features and compressor startup current patterns. Electrical faults were the rarest failure mode but also the most predictable, showing clear signatures in startup current profiles days before failure.

All three models ran inference every 15 minutes per device, coinciding with the feature computation schedule. Predictions were written to a dashboard accessible to the fleet operations team, with automatic alerts when any device crossed the "likely failure within 72 hours" threshold.

The Results

After six months in production:

  • 43% reduction in unplanned downtime. Failures were caught and scheduled for maintenance during planned stops rather than causing field breakdowns.
  • $1.2M estimated annual savings from prevented cargo spoilage, reduced emergency dispatch costs, and optimized maintenance scheduling.
  • 72-hour average lead time on failure predictions, giving the operations team enough runway to schedule maintenance without disrupting delivery routes.
  • 11% false positive rate, which the operations team found acceptable. A false positive meant an unnecessary inspection, costing roughly $200 in technician time. A missed failure cost $15,000 or more.

Lessons From the Field

Start at the edge. Pushing normalization to the device simplified everything downstream. The cost of deploying and maintaining edge agents was far less than the cost of handling device heterogeneity in the cloud.

Feature engineering beats model complexity. The compressor efficiency ratio, a simple division of two rolling means, was more predictive than any individual sensor channel fed into a complex model. Understanding the physics of the system and encoding that understanding into features produced better results than throwing raw data at a deep learning model.

Connectivity is a spectrum, not a binary. Designing for "sometimes connected" from the start, with proper buffering, deduplication, and late data handling, prevented an entire category of bugs that would have plagued us if we had assumed reliable connectivity.

Per-device baselines matter. A compressor current of 12 amps might be normal for one vehicle and a warning sign for another, depending on the unit's age, the specific compressor installed, and the typical operating conditions. We maintained per-device baseline statistics and computed anomaly scores relative to each device's own history rather than fleet-wide averages.

The operations team is the real customer. The ML models were useless until the operations team trusted them enough to act on the predictions. We spent significant effort on the dashboard UX, the alerting logic, and the explanation layer that showed operators why a particular vehicle was flagged. "The model says this vehicle will fail" is not actionable. "Compressor efficiency has degraded 34% over the past week, consistent with the pattern seen in 8 previous refrigerant leak failures" is actionable.

Fleet intelligence at this scale is not a single model problem. It is a systems engineering problem that happens to include ML as one component. The telemetry pipeline, the edge normalization, the feature store, the monitoring, the dashboard, and the alerting logic are all load-bearing parts of the system. The ML model is the brain, but without the nervous system to feed it and the muscles to act on its outputs, it produces nothing of value.

Discussion (2)

EM
eng_manager_techEngineering Manager · Technology1 week ago

Solid technical depth. This is the kind of content that makes me actually trust a vendor — they clearly know what they're talking about because nobody writes at this level of specificity without real experience.

M
Mostafa DhouibAuthor1 week ago

That's the goal — we write about what we've actually done, not what we've read about. Every article is based on real deployment experience, real numbers, real failures. Thanks for reading.

M
Mostafa DhouibFounder & ML Engineer at Opulion

Facing a similar challenge?

Tell us about your problem. We'll respond with an honest technical assessment within 24 hours.