With only 15 seconds of audio, OpenAI's AI can clone any voice

OpenAI plans to limit access to about 10 developers.

Alexander Lewis - Mar 30, 2024

The image shows OpenAI logo. — Pixabay

OpenAI has recently introduced Voice Engine, a text-to-voice platform that creates synthetic voices from a brief 15-second voice sample. This platform can convert the AI-generated voices to read text prompts in the original language of the speaker or in various other languages.

OpenAI explains in a blog post that these minor deployments are crucial in shaping their strategy, security measures, and broader potential applications in diverse sectors.

Several companies have been granted access to this technology, including educational tech firm Age of Learning, storytelling tool HeyGen, health software developer Dimagi, AI-based communication app Livox, and healthcare provider Lifespan.

OpenAI's shared samples showcase how Age of Learning utilises Voice Engine to produce pre-scripted voice-overs and generate "real-time, personalised responses" for students using GPT-4.

The technology, which OpenAI started developing in late 2022, has already been used to create preset voices for its text-to-speech API and the Read Aloud feature in ChatGPT. Jeff Harris from OpenAI's Voice Engine product team revealed in a TechCrunch interview that the model was trained using a combination of licensed and publicly sourced data. However, OpenAI plans to limit access to about 10 developers.

The field of AI text-to-audio generation continues to evolve, with companies like Podcastle and ElevenLabs focusing on AI voice cloning technology. These advancements, however, raise ethical questions, as explored by the Vergecast last year.

Concurrently, the US government is addressing the unethical uses of AI voice technology. For instance, the Federal Communications Commission recently prohibited AI voice-based robocalls, such as those mimicking President Joe Biden.

OpenAI has established strict usage policies for its partners. These include prohibitions against using Voice Generation for impersonation without consent, mandates for explicit permission from the voice sample's source, limitations on user-generated voices, and the requirement to inform listeners that the voices are AI-generated. Additionally, OpenAI has incorporated watermarking in the audio for traceability and monitors usage.

To mitigate potential risks, OpenAI suggests various measures like phasing out voice authentication for banking, establishing policies for voice use in AI, educating about AI deepfakes, and creating systems to track AI-generated content.