We build the infrastructure for
AI training data people actually agreed to.
Built for the enterprise.
OpenDataLabs is an AI training data infrastructure company. We emerged from MIT’s Media Lab in 2021, where co-founders Anna Kazlauskas and Art Abal were researching how distributed networks could change who owns and benefits from AI training data. What started as a research question became the foundation for the products we build today: a user context API and a marketplace for consented, multi-domain ground truth datasets.
Backed by Paradigm and Coinbase Ventures. Covered by MIT News, Wired, and TechCrunch. The founding team has raised $25M+ across three rounds to build the infrastructure layer for consented data portability at scale.
Built on infrastructure we designed.
The data infrastructure underlying both products was designed and built by the same team. The scale of that infrastructure is what makes the quality of what you buy possible:
1.5M+
people in the consented data network. The source of record for both products
20M+
permissioned data points across health, financial, behavioural, and conversational domains
The team.
We combine expertise in AI research, data sourcing, legal infrastructure, and enterprise software. The team’s backgrounds span MIT, Harvard, Appen, Coinbase, Celo, and the World Bank.

Anna Kazlauskas
CEO, OpenDataLabs · Creator, Vana
Anna studied computer science and economics at MIT, where she conducted research at CSAIL, the Federal Reserve, and the World Bank. She founded a YC-backed machine learning company before building Vana, which started as a class project at MIT’s Media Lab and scaled to 1.5M+ contributors. She is the inventor of the Vana protocol and the architect of OpenDataLabs’ product strategy.

Art Abal
Co-founder · Board Advisor
Art holds a master’s in public policy from Harvard Kennedy School, where he also worked as a social science researcher at the Belfer Center for Science and International Affairs. He is a qualified lawyer, previously at DLA Piper. Before Vana, he led data collection innovation at Appen, one of the world’s largest AI training data companies, and advised in the Office of the Prime Minister of Timor-Leste. He brings enterprise data market knowledge and legal depth to the founding team.

Maciej Witowski
Engineering Lead
Maciej is an engineer and engineering leader with a background in AI and blockchain infrastructure. Previously at Protocol Labs and Zendesk, he brings deep experience building distributed systems and production-grade software at scale. He leads the technical development of OpenDataLabs’ core infrastructure.

Jack Spallone
Data & Partnerships
Jack has spent nearly a decade at the intersection of data, media, and emerging technology. He was an early contributor to Ujo Music, the ConsenSys project that introduced the first music rights cases on Ethereum, and went on to help secure the winning technology partner bid for the Mechanical Licensing Collective (MLC) alongside Harry Fox Agency and SESAC. He subsequently built Oscillator, a data sharing protocol for music applications. Jack brings deep experience in data commercialisation, rights infrastructure, and industry partnerships.
The broader team spans applied ML research, data engineering, privacy-preserving computation, and enterprise BD. We have operated at the intersection of AI data supply chains and human-data economics since before it was a well-defined category.
What’s been written about us.
MIT News on the founding story: the MIT Media Lab origins, how the underlying data infrastructure was built, and what it means for AI development.
Read on MIT NewsTechCrunch on how OpenDataLabs built the infrastructure for users to contribute platform data to AI model training, and what that means for data supply chains.
Read on TechCrunchMIT’s School of Architecture and Planning on Vana’s model for distributed data ownership, what it means for AI development, and who benefits.
Read on MIT SA+PHow the data infrastructure works.
Both products run on a consent and portability layer built by the same team. It handles how data moves, what was authorised, and when. Complete audit trail at the record level. For buyers who need to know what’s underneath, the protocol is open source.
Let’s talk.
If you’re sourcing data for model training, building a product that needs real user context, or want to understand what’s technically possible before a commercial conversation: