Home
Skip to main content
xStryk™

Decision Intelligence for AI in production — guardrails, traceability & evaluation.

DATA OPS

Data quality checklist for production AI

xSingular10 min read

Validation rules, data contracts, statistical sampling, gold sets, drift, and lineage for datasets that support critical decisions.


Data as critical infrastructure

An AI model is only as good as the data that feeds it. In production environments, data quality is not a data engineering problem — it is an operational reliability problem. A null field in a credit dataset can mean a wrong decision about a real customer. A silent change in the encoding of a variable can invalidate months of training.

Production data quality requires the same rigor as software quality: automated tests, explicit contracts, continuous monitoring, and alerts.

Data quality dimensions

Every production dataset must be evaluated across six dimensions:

  • Completeness: Percentage of non-null fields. Define minimum thresholds per critical column (e.g., customer email can never be null; industry sector can tolerate 5% nulls).
  • Consistency: Values respect logical relationships (e.g., close date after open date, total amount equals sum of items).
  • Uniqueness: Absence of duplicates in primary keys. Near-duplicate detection in textual records.
  • Validity: Values are within acceptable ranges and expected formats (e.g., valid tax IDs, coordinates within operational territory).
  • Freshness: Data arrives within the expected time window. An alerts dataset with 4 hours of delay is useless for real-time decisions.
  • Precision: The level of granularity is sufficient for the decision. Temperatures rounded to whole degrees may be insufficient for industrial process monitoring.

Data contracts

A data contract is a formal agreement between the producer and consumer of a dataset. It defines schema, types, valid ranges, update frequency, and quality SLOs. If the contract breaks, the pipeline stops — garbage does not propagate.

  • Schema enforcement: Columns, data types, and exact order. Any change requires explicit versioning of the contract.
  • Validation rules: Per-column constraints (not null, unique, range, regex). Executed as tests on every ingestion.
  • Freshness SLOs: Maximum time between data generation and availability. Automatic alerts when the threshold is exceeded.
  • Responsibilities: Producer team commits to maintaining the contract. Consumer team commits to not depending on non-contractual fields.

Statistical sampling and gold sets

Gold sets are subsets of data manually labeled by domain experts, serving as ground truth references for evaluation. They are critical because they allow measuring real model performance against high-quality human decisions.

  • Size: Minimum 500-2000 observations per use case, stratified by the most relevant variables.
  • Construction: Labeled by at least 2 independent experts. Inter-annotator agreement measured and documented.
  • Immutability: Once created, a gold set is not modified. New versions are created when errors are detected or coverage is expanded.
  • Continuous expansion: Every false positive or false negative detected in production is added to the gold set for the next version.

Data drift detection

Data drift occurs when the distribution of production data diverges from the training distribution. If undetected, the model makes decisions based on patterns that no longer exist.

  • Feature drift: Changes in the distribution of individual variables. Monitor with PSI (Population Stability Index), KS-test, or Jensen-Shannon divergence.
  • Concept drift: Changes in the relationship between features and target. More subtle and dangerous — a model can maintain good distribution metrics but degrade in accuracy.
  • Monitoring windows: Compare weekly distribution vs. training baseline. Automatic alerts when PSI exceeds 0.25 (unstable) or 0.10 (requires investigation).

Lineage and traceability

Data lineage documents the origin, transformations, and destination of each data point. In Decision Intelligence, it is essential for audits and for debugging when a decision is questioned.

  • Backward lineage: Given a model output, reconstruct exactly what input data generated it, what pipeline version transformed them, and what model version processed them.
  • Forward lineage: Given a change in a data source, identify all affected models and decisions.
  • Metadata: Each dataset has a generation timestamp, content hash, schema version, and reference to the active contract.

Operational checklist

  • Data contracts formalized for every critical source
  • Validation rules executing on every ingestion
  • Gold sets built with at least 2 experts per use case
  • Drift monitoring configured with automatic alerts
  • Lineage implemented end-to-end (source to decision)
  • Freshness SLOs defined and monitored
  • Gold set expansion process with production errors
  • Schema versioning with documented changelog
  • Data quality dashboard accessible to business team
  • Runbooks for each type of quality alert

Need to implement this?

Let's talk 30 minutes about your use case. No strings attached.

Schedule call