Back to Blog
ENGINEERING
12 min read
January 20, 2026

Feature Engineering for Time-Series: What Textbooks Don't Tell You

Lag features, rolling statistics, frequency domain transforms, and change-point detection -- the practical playbook for industrial sensor data that no textbook covers properly.

Time-series feature engineering is where domain knowledge meets statistical rigor. Get it wrong and your model learns to cheat.

Why Time-Series Is a Different Animal

If you have spent any time building ML models on tabular data, you have developed intuitions that will actively hurt you when you move to time-series. I need to say this upfront because I watch senior engineers make this mistake repeatedly: time-series data has an implicit ordering contract that changes everything about how you engineer features, split data, and validate models.

In a standard tabular problem, each row is (approximately) independent. You can shuffle, split randomly, cross-validate with k-fold, and life is good. The moment your data has a time axis, all of that breaks. Every feature you create, every validation split you design, every augmentation you apply must respect the arrow of time. Violate this contract and you get models that look incredible on paper and fail catastrophically in production.

I have built time-series ML systems for drilling rigs, manufacturing lines, and defense platforms. The patterns I am sharing here are battle-tested on industrial sensor data -- not clean academic datasets with neat periodicities and no missing values.

Let me walk you through the feature engineering techniques that actually matter, and more importantly, the traps that will silently destroy your models.

Lag Features Done Right

Lag features are the bread and butter of time-series feature engineering. The idea is simple: use previous values of a signal as features for predicting the current or future value. But the implementation details matter enormously.

The Basics

import pandas as pd

def create_lag_features(df, column, lags):
    """Create lag features respecting temporal ordering."""
    for lag in lags:
        df[f"{column}_lag_{lag}"] = df[column].shift(lag)
    return df

# Example: sensor temperature readings
df = create_lag_features(df, "temperature", lags=[1, 2, 3, 6, 12, 24])

Simple enough. But here is where people go wrong.

Choosing Lag Windows

Do not just pick arbitrary lag values. Your lag selection should be driven by three things:

  1. The physical process you are modeling. If you are monitoring a pump that has a 30-second cycle time, your lags should align with multiples of that cycle. A lag of 7 samples when your sampling rate is 10 Hz captures 700ms of history -- is that physically meaningful?

  2. Autocorrelation analysis. Before guessing, compute the autocorrelation function (ACF) and partial autocorrelation function (PACF). Peaks in the PACF tell you which specific lags carry independent predictive information.

from statsmodels.tsa.stattools import pacf

# Compute PACF to find informative lags
pacf_values = pacf(df["temperature"].dropna(), nlags=50)
significant_lags = [i for i, v in enumerate(pacf_values) if abs(v) > 0.1 and i > 0]
  1. The prediction horizon. If you are predicting 15 minutes ahead, using a 1-sample lag is less useful than a lag matching your prediction window. Think about what information the model needs at inference time.

The Multi-Entity Trap

If your dataset contains multiple entities (multiple sensors, multiple machines, multiple patients), you must create lag features per entity. I have seen production systems where lag features leaked information across entities because someone forgot a groupby:

# WRONG: leaks across entities
df["temp_lag_1"] = df["temperature"].shift(1)

# RIGHT: respect entity boundaries
df["temp_lag_1"] = df.groupby("machine_id")["temperature"].shift(1)

This seems obvious when I spell it out, but in a pipeline with 50 features and 10 entities, it is easy to miss one.

Rolling Window Strategies

Rolling statistics (mean, std, min, max, quantiles) over sliding windows are powerful features that capture local trends and volatility. But the implementation choices have significant impact.

Fixed vs. Expanding Windows

# Fixed rolling window - captures local behavior
df["temp_rolling_mean_60"] = (
    df.groupby("machine_id")["temperature"]
    .transform(lambda x: x.rolling(window=60, min_periods=1).mean())
)

# Expanding window - captures cumulative behavior
df["temp_expanding_mean"] = (
    df.groupby("machine_id")["temperature"]
    .transform(lambda x: x.expanding(min_periods=1).mean())
)

Use fixed windows when you care about recent behavior (fault detection, anomaly detection). Use expanding windows when you care about long-term baselines (drift detection, degradation monitoring).

Multi-Scale Windows

Do not pick one window size. Use multiple scales to capture different temporal dynamics:

WINDOWS = [10, 30, 60, 300, 900]  # seconds, assuming 1 Hz sampling

for w in WINDOWS:
    df[f"temp_mean_{w}s"] = df.groupby("machine_id")["temperature"].transform(
        lambda x: x.rolling(w, min_periods=1).mean()
    )
    df[f"temp_std_{w}s"] = df.groupby("machine_id")["temperature"].transform(
        lambda x: x.rolling(w, min_periods=1).std()
    )

The ratios between different window scales are often more informative than the raw statistics. A short-term mean that deviates from a long-term mean is a classic indicator of regime change:

df["temp_mean_ratio_10_300"] = df["temp_mean_10s"] / df["temp_mean_300s"].clip(lower=1e-6)

Exponentially Weighted Statistics

Rolling windows treat all samples within the window equally. Exponentially weighted moving averages (EWMAs) give more weight to recent observations, which often matches how physical systems behave:

df["temp_ewm_30"] = df.groupby("machine_id")["temperature"].transform(
    lambda x: x.ewm(span=30).mean()
)

The span parameter controls the effective memory. A span of 30 means the weight of a sample halves after roughly 30 time steps. This is particularly useful for sensor data where recent readings matter more than readings from an hour ago.

Frequency Domain Features

This is where most practitioners stop, and it is a mistake. Time-domain features (lags, rolling stats) capture trends and local patterns, but many physical phenomena have strong frequency signatures that are invisible in the time domain.

FFT-Based Features

For a rolling window of sensor data, you can extract spectral features using the Fast Fourier Transform:

import numpy as np

def spectral_features(signal, sampling_rate=1.0):
    """Extract frequency-domain features from a signal segment."""
    n = len(signal)
    fft_vals = np.fft.rfft(signal)
    fft_mag = np.abs(fft_vals)
    freqs = np.fft.rfftfreq(n, d=1.0/sampling_rate)

    # Dominant frequency
    dominant_freq = freqs[np.argmax(fft_mag[1:]) + 1]

    # Spectral centroid (center of mass of spectrum)
    spectral_centroid = np.sum(freqs * fft_mag) / (np.sum(fft_mag) + 1e-10)

    # Spectral entropy (randomness of spectrum)
    psd = fft_mag ** 2
    psd_norm = psd / (psd.sum() + 1e-10)
    spectral_entropy = -np.sum(psd_norm * np.log2(psd_norm + 1e-10))

    # Band energies
    low_band = np.sum(psd[(freqs >= 0) & (freqs < sampling_rate/8)])
    mid_band = np.sum(psd[(freqs >= sampling_rate/8) & (freqs < sampling_rate/4)])
    high_band = np.sum(psd[(freqs >= sampling_rate/4)])

    return {
        "dominant_freq": dominant_freq,
        "spectral_centroid": spectral_centroid,
        "spectral_entropy": spectral_entropy,
        "low_band_energy": low_band,
        "mid_band_energy": mid_band,
        "high_band_energy": high_band,
    }

Vibration analysis in particular benefits enormously from spectral features. A bearing degradation that shows up as a gradual amplitude change in the time domain often manifests as a sharp, detectable peak emerging at a specific frequency in the spectral domain.

Wavelet Features

When the frequency content changes over time (non-stationary signals), wavelets outperform FFT because they provide time-frequency localization:

import pywt

def wavelet_features(signal, wavelet="db4", level=4):
    """Extract wavelet decomposition features."""
    coeffs = pywt.wavedec(signal, wavelet, level=level)
    features = {}
    for i, c in enumerate(coeffs):
        features[f"wavelet_energy_level_{i}"] = np.sum(c ** 2)
        features[f"wavelet_std_level_{i}"] = np.std(c)
        features[f"wavelet_entropy_level_{i}"] = -np.sum(
            (c**2 / (np.sum(c**2) + 1e-10)) * np.log2(c**2 / (np.sum(c**2) + 1e-10) + 1e-10)
        )
    return features

For industrial sensor data, Daubechies wavelets (db4 through db8) typically work well. The decomposition levels map to different frequency bands, giving you a multi-resolution view of the signal.

Change-Point Detection Features

Many industrial ML problems boil down to detecting when a system transitions between operating regimes. Encoding change-point information as features can dramatically improve downstream models.

CUSUM-Based Features

The cumulative sum (CUSUM) algorithm detects shifts in the mean of a process:

def cusum_feature(signal, threshold=1.0, drift=0.0):
    """Compute CUSUM statistic for change-point detection."""
    mean = np.mean(signal[:min(50, len(signal))])  # baseline from initial data
    s_pos = np.zeros(len(signal))
    s_neg = np.zeros(len(signal))

    for i in range(1, len(signal)):
        s_pos[i] = max(0, s_pos[i-1] + (signal[i] - mean) - drift)
        s_neg[i] = max(0, s_neg[i-1] - (signal[i] - mean) - drift)

    return s_pos, s_neg

The CUSUM value itself is a powerful feature. When it exceeds the threshold, a change point has occurred. The rate of increase in the CUSUM value indicates how severe the shift is.

Ratio and Difference Features

Simple but effective: the ratio of short-term to long-term statistics flags regime changes:

# Rate of change features
df["temp_diff_1"] = df.groupby("machine_id")["temperature"].diff(1)
df["temp_diff_10"] = df.groupby("machine_id")["temperature"].diff(10)

# Acceleration (second derivative)
df["temp_accel"] = df.groupby("machine_id")["temp_diff_1"].diff(1)

# Z-score relative to rolling baseline
rolling_mean = df.groupby("machine_id")["temperature"].transform(
    lambda x: x.rolling(300).mean()
)
rolling_std = df.groupby("machine_id")["temperature"].transform(
    lambda x: x.rolling(300).std()
)
df["temp_zscore"] = (df["temperature"] - rolling_mean) / (rolling_std + 1e-6)

The z-score feature is particularly valuable because it normalizes the signal relative to its recent behavior. A z-score of 3.0 means the current reading is 3 standard deviations away from its recent norm -- regardless of the absolute scale.

Temporal Leakage: The Silent Killer

This is the section I wish someone had written for me five years ago. Temporal leakage is when your model inadvertently gets access to information from the future during training, resulting in artificially inflated performance that vanishes in production.

Common Leakage Vectors

1. Global normalization before splitting.

# WRONG: normalizes using future data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[features] = scaler.fit_transform(df[features])
train = df[df["timestamp"] < cutoff]
test = df[df["timestamp"] >= cutoff]

# RIGHT: fit only on training data
train = df[df["timestamp"] < cutoff]
test = df[df["timestamp"] >= cutoff]
scaler = StandardScaler()
train[features] = scaler.fit_transform(train[features])
test[features] = scaler.transform(test[features])

2. Rolling features computed on the full dataset.

If you compute rolling statistics before splitting into train/test, the rolling window at the boundary can include future data. Always compute rolling features within the correct temporal context, or use min_periods carefully.

3. Target encoding with future data.

If you are doing target encoding (replacing a categorical feature with the mean of the target), you must use only past data for each encoding:

# WRONG: uses global mean including future
df["machine_mean_target"] = df.groupby("machine_id")["target"].transform("mean")

# RIGHT: use expanding mean that only sees past data
df["machine_mean_target"] = (
    df.groupby("machine_id")["target"]
    .transform(lambda x: x.expanding().mean().shift(1))
)

4. Cross-validation with random splits.

Never use k-fold cross-validation on time-series. Use time-series-aware splitting:

from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
for train_idx, val_idx in tscv.split(df):
    assert df.iloc[train_idx]["timestamp"].max() < df.iloc[val_idx]["timestamp"].min()

How to Detect Leakage

If your model performs suspiciously well, test for leakage:

  1. Shuffle the time ordering of your test set. If performance stays the same, your features might not be using temporal structure at all (which is suspicious).
  2. Add increasing lag to your prediction horizon. Performance should degrade smoothly. If it drops off a cliff at a specific lag, you likely have leakage at that offset.
  3. Check feature importances. If a lag feature at exactly your prediction horizon is the top feature, something is wrong.

Sensor Data Specific Tips

Industrial sensor data has its own special challenges that you will not find in Kaggle competitions or textbooks.

Missing Data Is Not Random

Sensor data goes missing for physical reasons: a sensor fails, a communication link drops, or a machine is shut down. The fact that data is missing is often informative. Encode missingness as a feature:

df["temp_is_missing"] = df["temperature"].isna().astype(int)
df["temp_missing_streak"] = (
    df.groupby("machine_id")["temp_is_missing"]
    .transform(lambda x: x.groupby((x != x.shift()).cumsum()).cumcount() + 1)
    * df["temp_is_missing"]
)

Sampling Rate Matters

If you have multiple sensors sampled at different rates, you need to handle alignment carefully. Do not just forward-fill everything -- that hides the actual temporal relationship. Instead, create features that are aware of the sampling characteristics:

# Track time since last valid reading for each sensor
df["temp_time_since_last"] = (
    df.groupby("machine_id")["timestamp"]
    .diff()
    .dt.total_seconds()
    .where(~df["temperature"].isna())
    .ffill()
)

Sensor Drift and Calibration

Sensors drift over time. A temperature sensor that reads 100C today might read 102C in three months for the same actual temperature. If you do not account for this, your model will learn calibration artifacts instead of real patterns.

Use relative features (differences, ratios between sensors) rather than absolute values when possible. Cross-sensor relationships are often more stable than individual sensor readings.

Operational Context Features

The operating mode of the machine matters. A pump vibration of 5mm/s might be normal during startup but dangerous during steady-state operation. Include operational context:

# Operating mode (if available)
df["is_startup"] = (df["rpm"] > 0) & (df["rpm"] < df["rpm_nominal"] * 0.8)
df["is_steady_state"] = df["rpm"] >= df["rpm_nominal"] * 0.8
df["is_shutdown"] = df["rpm"].diff() < -threshold

# Time since last mode change
df["time_in_current_mode"] = (
    df.groupby(["machine_id", "operating_mode"])
    .cumcount()
)

Putting It All Together

Here is the feature engineering pipeline I use as a starting template for industrial time-series projects:

def build_feature_set(df, target_col, entity_col="machine_id", time_col="timestamp"):
    """Complete feature engineering pipeline for industrial sensor data."""

    sensor_cols = [c for c in df.columns if c not in [target_col, entity_col, time_col]]

    for col in sensor_cols:
        # 1. Lag features (informed by PACF analysis)
        for lag in [1, 2, 3, 5, 10, 30, 60]:
            df[f"{col}_lag_{lag}"] = df.groupby(entity_col)[col].shift(lag)

        # 2. Rolling statistics at multiple scales
        for window in [10, 30, 60, 300]:
            grp = df.groupby(entity_col)[col]
            df[f"{col}_mean_{window}"] = grp.transform(
                lambda x: x.rolling(window, min_periods=1).mean()
            )
            df[f"{col}_std_{window}"] = grp.transform(
                lambda x: x.rolling(window, min_periods=1).std()
            )

        # 3. Derivative features
        df[f"{col}_diff_1"] = df.groupby(entity_col)[col].diff(1)
        df[f"{col}_diff_10"] = df.groupby(entity_col)[col].diff(10)

        # 4. Z-score relative to local baseline
        local_mean = df.groupby(entity_col)[col].transform(
            lambda x: x.rolling(300, min_periods=1).mean()
        )
        local_std = df.groupby(entity_col)[col].transform(
            lambda x: x.rolling(300, min_periods=1).std()
        )
        df[f"{col}_zscore"] = (df[col] - local_mean) / (local_std + 1e-6)

        # 5. Missingness features
        df[f"{col}_is_missing"] = df[col].isna().astype(int)

    # 6. Cross-sensor features (ratios, differences)
    if len(sensor_cols) >= 2:
        for i in range(len(sensor_cols)):
            for j in range(i+1, len(sensor_cols)):
                c1, c2 = sensor_cols[i], sensor_cols[j]
                df[f"{c1}_{c2}_ratio"] = df[c1] / (df[c2] + 1e-6)
                df[f"{c1}_{c2}_diff"] = df[c1] - df[c2]

    return df

The key takeaway: time-series feature engineering is not about applying every technique in the book. It is about understanding the physical process you are modeling, respecting temporal ordering, and systematically encoding the information that matters for your prediction task. Get the fundamentals right -- lag selection, leakage prevention, multi-scale statistics -- and you will outperform most models that rely on fancier architectures but sloppier features.

Every feature you create should answer the question: "Would this information be available in production at inference time, and does it encode something physically meaningful about the system?" If the answer to either question is no, delete it.

Discussion (2)

EM
eng_manager_techEngineering Manager · Technology1 week ago

Solid technical depth. This is the kind of content that makes me actually trust a vendor — they clearly know what they're talking about because nobody writes at this level of specificity without real experience.

M
Mostafa DhouibAuthor1 week ago

That's the goal — we write about what we've actually done, not what we've read about. Every article is based on real deployment experience, real numbers, real failures. Thanks for reading.

M
Mostafa DhouibFounder & ML Engineer at Opulion

Facing a similar challenge?

Tell us about your problem. We'll respond with an honest technical assessment within 24 hours.