Every claim we make about data quality,
we measure first.

OpenDataLabs publishes research on AI training data quality, consented data collection, and the economics of human data licensing.

What we study

Consent rate benchmarks

What percentage of users share data, across which domains, in which integration contexts. Real numbers by data type and UX pattern. If you’re deciding whether to build a data-sharing flow, this is the data you need.

Contributor incentives and network sustainability

The economics of user data contribution at scale. Relevant for anyone building on a contributor network.

Portability and interoperability standards

How user data moves between systems without losing provenance. The standards question that AI teams, regulators, and platform operators will all have to answer.

Cross-domain enrichment effects

A contributor’s health data linked to their financial behaviour linked to their communication patterns tells you things none of those datasets tell you individually. We measure the performance uplift from multi-domain linkage and publish the methodology. That’s where ground truth data earns its price premium.

Data quality differentials: ground truth vs annotated vs synthetic

We measure the gap and publish the methodology. The difference matters most in health, finance, and language modelling. We quantify it.

Consent rate benchmarks

Contributor incentives and network sustainability

The economics of user data contribution at scale. Relevant for anyone building on a contributor network.

Data quality differentials: ground truth vs annotated vs synthetic

We measure the gap and publish the methodology. The difference matters most in health, finance, and language modelling. We quantify it.

Portability and interoperability standards

How user data moves between systems without losing provenance. The standards question that AI teams, regulators, and platform operators will all have to answer.

Cross-domain enrichment effects

Research

Abstract sediment-like composition in soft neutral tones.

State of Context Report Q1 2026

April 14, 2026

Most AI deployments underperform because the missing piece is not the model but the context. This report breaks context into five measurable layers so teams can see where their AI systems are strong, exposed, and where to invest next. Read

3D isometric illustration of layered data infrastructure.

Open Problems in AI Data Economics

October 30, 2025

We introduce data economics as a coherent field and define the open problems that haven't yet been formalised. Most AI economics research focuses on downstream effects. We argue you can't understand AI's economic trajectory without studying how data, compute, and labour interact at the production layer. Read

3D isometric illustration of data processing architecture.

Model Influence Functions: Measuring Data Quality

August 1, 2024

A framework for attributing model outputs to training data contributions. The methodological foundation for pricing and valuing datasets in commercial AI pipelines. Read

Research collaborations.

We work with AI labs, universities, and enterprise data teams on applied research that needs ground truth data at scale. If your research requires that and you’re working on something that benefits the field, get in touch.

Get in touch

Every claim we make about data quality,we measure first.

What we study

Consent rate benchmarks

Contributor incentives and network sustainability

Portability and interoperability standards

Cross-domain enrichment effects

Data quality differentials: ground truth vs annotated vs synthetic

Consent rate benchmarks

Contributor incentives and network sustainability

Data quality differentials: ground truth vs annotated vs synthetic

Portability and interoperability standards

Cross-domain enrichment effects

Research

State of Context Report Q1 2026

Open Problems in AI Data Economics

Model Influence Functions: Measuring Data Quality

Research collaborations.

Every claim we make about data quality,
we measure first.