Mlops Fundamentals: the Lifecycle of Machine Learning¶

Estimated time to read: 3 minutes

Machine Learning Operations (MLOps) is the application of DevOps principles to the machine learning lifecycle. While traditional DevOps focuses on code and binary versions, MLOps must handle Code + Data + Models.

This guide covers the foundational stages of a production-grade MLOps environment.

MLOps vs. Traditional DevOps¶

Feature	DevOps	MLOps
Primary Unit	Code / Binary	Code + Data + Model
Success Metric	Uptime, Latency, Throughput	Accuracy, Precision, Drift
Testing	Unit & Integration tests	Model validation, Data quality tests
Monitoring	System health	Prediction drift, Data drift

The MLOps Lifecycle¶

graph LR
    A[Data Prep] --> B[Model Training]
    B --> C[Model Validation]
    C --> D[Model Registry]
    D --> E[Deployment / Serving]
    E --> F[Monitoring]
    F --> G{Drift Detected?}
    G -- Yes --> A
    G -- No --> F

Stage 1: Data Preparation & Feature Engineering¶

ML models are only as good as the data they are trained on. This stage involves cleaning, transforming, and tracking data versions.

Feature Store: A centralised repository for sharing and managing features across training and serving. See Feature Engineering, DevOps & Observability.

Stage 2: Model Training & Experiment Tracking¶

Instead of just committing code, you are tracking experiments: hyper-parameters, data versions, and training results.

Tools: MLflow, Weights & Biases, Kubeflow.

Stage 3: Model Registry¶

A versioned store for your trained models. You never deploy a "loose" file from a developer's laptop. You deploy a versioned artefact from the registry.

Stage 4: Deployment & Serving¶

Models can be deployed as:

Batch Prediction: Scoring millions of rows on a schedule (e.g., in BigQuery ML).

Real-time Serving: An API endpoint that returns a prediction in milliseconds (e.g., via Vertex AI or Seldon).

You must monitor for Drift:

Data Drift: The input data distribution changes (e.g., a new sensor version starts sending different values).

Model Drift: The model's performance degrades in production because the real-world relationship between inputs and outputs has changed.

MLOps Maturity Levels¶

The OpsAtScale Data Maturity Assessment maps these capabilities.

🔴 Level 0: Manual (No MLOps)¶

Manual data prep and training.
Models passed as files (e.g., .pkl) via email or Slack.
No monitoring of performance.

🟡 Level 1: Automated Pipeline¶

Code to train is versioned.
Automated pipeline triggers training when new data arrives.
Model validation gates before deployment.

🔵 Level 2: CI/CD/CT (Continuous Training)¶

CT: The system automatically retrains the model when performance drifts or new data is available.

A/B Testing: New models are compared against "champion" models before full rollout. See A/B Testing with BigQuery ML.

Summary Checklist¶

Versioning: Are you versioning your data samples as well as your code?
Registry: Are you using a formal Model Registry for all deployments?
Monitoring: Do you have alerts for when prediction accuracy drops below a threshold?
Drift: Do you monitor the statistical distribution of your input features?
Reproducibility: Can you rebuild a model from 6 months ago with the exact same data and code?