Home
Skip to main content
xStryk™

Decision Intelligence for AI in production — guardrails, traceability & evaluation.

xTheus

Model Monitoring: Detecting Drift Before Disaster

Models Degrade Silently

A production model doesn't fail catastrophically: it degrades gradually. Input data distribution shifts (data drift), the relationship between features and target evolves (concept drift), and predictions become less calibrated (prediction drift). Without active monitoring, these degradations go unnoticed until business impact is visible, and by then it's too late.

Three Types of Drift and How to Measure Them

Drift Taxonomy
Data Drift
P(X) changes
PSI, KS test, KL div
Detectable without labels
Concept Drift
P(Y|X) changes
Performance decay
Requires labels
Prediction Drift
P(Ŷ) changes
Score distribution
Proxy without labels

PSI (Population Stability Index) compares a feature's distribution between the baseline (training) and production. A PSI > 0.2 indicates significant drift. The KS test (Kolmogorov-Smirnov) is more sensitive for continuous distributions. KL divergence measures the information difference between distributions. In practice, we combine all three: PSI for quick alerts, KS for statistical validation, KL to quantify the magnitude of change.

Alerts, Shadow Scoring, and Auto-Retraining

When drift exceeds a threshold, the system must react in cascade: (1) alert the team, (2) activate shadow scoring with a candidate model, (3) if the candidate outperforms the production model on business metrics, trigger the retraining pipeline, (4) validate with evaluation gates, (5) canary deploy. All automated, with human-in-the-loop only for critical decisions.

Google Cloud · Model Monitoring Stack
Monitoring
Vertex AI Model MonitoringCloud Monitoring (alerts)
Baseline
BigQuery (stats)Cloud Storage (artifacts)
Alerting
Cloud FunctionsCloud Logging
Auto-Retrain
Vertex AI PipelinesVertex AI Training
Dashboard
LookerBigQuery (drift log)

Key Takeaways

  • Data drift (P(X)) is detectable without labels and should be monitored with PSI, KS test, and KL divergence.
  • Concept drift (P(Y|X)) requires labels and is the most dangerous: the model fails silently.
  • Auto-retraining must be gated by evaluation suites, not blindly triggered by drift.