Home
Skip to main content
xStryk™

Decision Intelligence for AI in production — guardrails, traceability & evaluation.

xTheus

MLOps Pipeline Design: From Notebooks to Continuous Production

The Notebook Is Not a Pipeline

An estimated 87% of ML models never make it to production. The primary reason is not model quality but the absence of pipeline engineering that enables reproducible training, automated deployment, and safe rollback. An MLOps pipeline separates concerns: data ingestion, feature engineering, training, evaluation, registry, deployment, and monitoring operate as independent stages with well-defined contracts.

Training Pipeline vs Serving Pipeline

A common mistake is mixing training and serving in the same pipeline. The training pipeline is batch, latency-tolerant, and throughput-optimized. The serving pipeline is online, latency-sensitive, and p99-optimized. Separation allows each to scale, version, and monitor independently. The connection point is the Model Registry: the training pipeline writes versioned artifacts, the serving pipeline reads the artifact promoted to production.

Training / Serving Separation
Training Pipeline
Batch / scheduled
Throughput-optimized
Writes to Model Registry
Vertex AI Pipelines
Serving Pipeline
Online / real-time
Latency-optimized (p99)
Reads from Model Registry
Vertex AI Endpoints

Deployment Strategies: Canary, Blue-Green, Shadow

A direct deploy of a new model is a risk. Canary deployment sends a percentage of traffic to the new model while the previous model handles the rest. If business metrics (not just ML metrics) degrade, rollback is automatic. Blue-green deployment maintains two complete environments; the switch is instantaneous. Shadow deployment runs the new model in parallel without affecting production, comparing outputs to validate before promoting.

CI/CD for Models: Beyond Code

CI/CD for ML is not just running unit tests. The pipeline must validate: (1) input data quality (schema, distributions, nulls), (2) model metrics (AUC, precision, recall per slice), (3) regression tests against a baseline, (4) inference latency, (5) artifact size. Only if all gates pass, the model is promoted to staging and then production. Cloud Build orchestrates CI; Vertex AI Pipelines orchestrates CD with post-deploy evaluations.

Google Cloud · MLOps Pipeline Stack
CI/CD
Cloud BuildArtifact Registry
Training
Vertex AI PipelinesVertex AI Training
Features
Vertex AI Feature StoreBigQuery
Registry
Vertex AI Model RegistryVertex AI Experiments
Serving
Vertex AI Endpoints (canary)Cloud Run (pre/post)
Monitoring
Cloud MonitoringVertex AI Model Monitoring

Key Takeaways

  • Separating the training pipeline (batch) from the serving pipeline (online) is fundamental to scaling MLOps.
  • Canary deployments with automatic rollback based on business metrics, not just ML metrics.
  • CI/CD for ML includes data quality gates, regression tests, and latency validation, not just unit tests.
  • The Model Registry is the contract between training and serving: versioned artifacts with evaluation metadata.