January 31st, 2019
When you listen to Google Maps driving directions in your car, get answers from your Google Home, or hear a spoken translation in Google Translate, you’re using Google’s speech synthesis, or text-to-speech (TTS) technology. Speech interfaces not only allow you to interact naturally and conveniently with digital devices, they’re a crucial technology for making information universally accessible: TTS opens up the internet to millions of users all over the world who may not be able to read, or who have visual impairments.
Over the last few years, there’s been an explosion of new research using neural networks to simulate a human voice. These models, including many developed at Google, can generate increasingly realistic, human-like speech.
While the progress is exciting, we’re keenly aware of the risks this technology can pose if used with the intent to cause harm. Malicious actors may synthesize speech to try to fool voice authentication systems, or they may create forged audio recordings to defame public figures. Perhaps equally concerning, public awareness of “deep fakes” (audio or video clips generated by deep learning models) can be exploited to manipulate trust in media: as it becomes harder to distinguish real from tampered content, bad actors can more credibly claim that authentic data is fake.
We’re taking action. When we launched the Google News Initiative last March, we committed to releasing datasets that would help advance state-of-the-art research on fake audio detection. Today, we’re delivering on that promise: Google AI and Google News Initiative have partnered to create a body of synthetic speech containing thousands of phrases spoken by our deep learning TTS models. These phrases are drawn from English newspaper articles and are spoken by 68 synthetics “voices” covering a variety of regional accents.
We’re making this dataset available to all participants in the independent, externally-run 2019 ASVspoof challenge. This open challenge invites researchers all over the globe to submit countermeasures against fake (or “spoofed”) speech, with the goal of making automatic speaker verification (ASV) systems more secure. By training models on both real and computer-generated speech, ASVspoof participants can develop systems that learn to distinguish between the two. The results will be announced in September at the 2019 Interspeech conference in Graz, Austria.
As we published in our AI Principles last year, we take seriously our responsibility both to engage with the external research community and to apply strong safety practices to avoid unintended results that create risks of harm. We’re also firmly committed to Google News Initiative’s charter to help journalism thrive in the digital age, and our support for the ASVspoof challenge is an important step along the way.