Home
Skip to main content
xStryk™

Decision Intelligence for AI in production — guardrails, traceability & evaluation.

xTheus

Feature Engineering in Production: From Notebooks to Pipelines

The Training-Serving Skew Problem

Training-serving skew is the #1 cause of silent model degradation in production. It occurs when the features the model saw during training differ from those received at inference time. Causes are multiple: transformations implemented differently in training (Python/pandas) vs serving (Java/SQL), features computed with future data during training (temporal data leakage), or simply bugs in feature engineering logic that go undetected until production.

Feature Store: The Skew Solution

A Feature Store centralizes feature definition, computation, and serving. The same transformation serves both training and inference, eliminating skew by design. Offline features (batch) are materialized in BigQuery for training; online features (low-latency) are served from Bigtable or Memorystore for inference. Point-in-time correctness ensures that during training only features available at the time of each observation are used, preventing data leakage.

Online vs Offline Features
Offline (Batch)
Latency: minutes-hours
BigQuery, Cloud Storage
Training, analytics
Online (Low-latency)
Latency: <10ms
Bigtable, Memorystore
Real-time inference

Feature Monitoring and Governance

Features are critical assets: they require versioning, ownership, documentation, lineage, and quality monitoring. Feature monitoring detects distribution anomalies, unexpected nulls, and degraded freshness. Feature governance assigns owners, defines freshness SLAs, and maintains a searchable catalog of all features in the ecosystem. Without governance, features proliferate without control and the feature store becomes a swamp.

Google Cloud · Feature Engineering Stack
Transforms
DataflowBigQuery (SQL)
Feature Store
Vertex AI Feature Store
Offline Store
BigQueryCloud Storage
Online Store
BigtableMemorystore
Orchestration
Cloud ComposerCloud Monitoring

Key Takeaways

  • Training-serving skew is the #1 cause of silent degradation. A Feature Store eliminates it by design.
  • Point-in-time correctness prevents temporal data leakage during training.
  • Online (Bigtable, <10ms) and offline (BigQuery, batch) features must be served from the same definition.