Mozilla Common Voice

Mozilla Foundation

International

2025

Project Summary

The dominant modes of data creation are broken. Datasets are created without consent - scraped from the internet, and mirroring the gaps, biases and toxicities of that space. Or they’re farmed out as low-wage work, then monetised for huge margins. The antidote is crowdsourced, public participation datasets that grant people power to shape AI as they see fit. Mozilla Common Voice provides global communities with the open infrastructure they need to build diverse, multilingual voice and text datasets for the public good. At 30,000 hours, 750,000 contributors and 140+ global languages - it's the largest and most diverse public participation speech dataset in the world.