BW Communities

Google To Empower Indian Developers With Open-Sourced Speech, Building Datasets

BW Online Bureau Jun 28, 2023

Google on Wednesday announced significant strides in its efforts to foster technological development and accessibility in India.

In collaboration with the Indian Institute of Science (IISc), the company launched Project Vaani last year, an initiative aimed at collecting anonymised speech data from individuals across the country. As part of the project's first milestone, Google India and IISc have now open-sourced over 4,000 hours of diverse and high-quality speech datasets spanning 38 languages.

This vast collection of speech data, curated from more than 10,000 speakers across 80 districts encompassing all of India's 773 districts, is expected to help developers create applications that accurately reflect the nuances of local languages. Google Cloud says it hopes to enhance the development of speech recognition and natural language processing technologies in India by sharing this data.

Google has also released its Open Buildings datasets for India. Utilising satellite imagery, these datasets meticulously identify the locations and outlines of over 200 million buildings throughout the country. Each building in the dataset is equipped with a unique Plus Code, an open-source system that provides precise addressing information.

The release of Open Buildings datasets is expected to prove invaluable for developers, authorities, and aid organisations working in various sectors, including urban planning, humanitarian response, environmental and climate science, healthcare, and education.