MLOps Pipeline Design: From Notebooks to Continuous Production
The Notebook Is Not a Pipeline
An estimated 87% of ML models never make it to production. The primary reason is not model quality but the absence of pipeline engineering that enables reproducible training, automated deployment, and safe rollback. An MLOps pipeline separates concerns: data ingestion, feature engineering, training, evaluation, registry, deployment, and monitoring operate as independent stages with well-defined contracts.
Training Pipeline vs Serving Pipeline
A common mistake is mixing training and serving in the same pipeline. The training pipeline is batch, latency-tolerant, and throughput-optimized. The serving pipeline is online, latency-sensitive, and p99-optimized. Separation allows each to scale, version, and monitor independently. The connection point is the Model Registry: the training pipeline writes versioned artifacts, the serving pipeline reads the artifact promoted to production.
Deployment Strategies: Canary, Blue-Green, Shadow
A direct deploy of a new model is a risk. Canary deployment sends a percentage of traffic to the new model while the previous model handles the rest. If business metrics (not just ML metrics) degrade, rollback is automatic. Blue-green deployment maintains two complete environments; the switch is instantaneous. Shadow deployment runs the new model in parallel without affecting production, comparing outputs to validate before promoting.
CI/CD for Models: Beyond Code
CI/CD for ML is not just running unit tests. The pipeline must validate: (1) input data quality (schema, distributions, nulls), (2) model metrics (AUC, precision, recall per slice), (3) regression tests against a baseline, (4) inference latency, (5) artifact size. Only if all gates pass, the model is promoted to staging and then production. Cloud Build orchestrates CI; Vertex AI Pipelines orchestrates CD with post-deploy evaluations.
Key Takeaways
- Separating the training pipeline (batch) from the serving pipeline (online) is fundamental to scaling MLOps.
- Canary deployments with automatic rollback based on business metrics, not just ML metrics.
- CI/CD for ML includes data quality gates, regression tests, and latency validation, not just unit tests.
- The Model Registry is the contract between training and serving: versioned artifacts with evaluation metadata.
