Marijan Hassan - Tech Journalist

OpenAI announces voice cloning AI model but with limited access

OpenAI has unveiled its groundbreaking text-to-voice platform named Voice Engine giving us a glimpse into the future of synthetic voice generation. From the company that gave us ChatGPT, Voice Engine can craft a lifelike voice based on just a 15-second snippet of someone's speech.

More impressively the AI model can deliver text prompts in multiple languages, including the speaker's native tongue.

In a recent blog post, OpenAI expressed its commitment to responsible deployment of Voice Engine emphasizing the importance of solid strategies and safeguards to ensure the ethical use of this powerful tool.

At the moment Voice Engine is only available to select users. Some of the most notable ones include, Age of Learning, HeyGen, Dimagi, Livox, and Lifespan. These companies have already started exploring various applications ranging from the creation of pre-scripted voice-over content to delivering personalized responses in real-time by coupling it with GPT-4.

A brief history

Voice Engine's journey began in late 2022, and it has already left its mark by powering preset voices for text-to-speech APIs and enhancing ChatGPT's Read Aloud feature. Jeff Harris from OpenAI's product team revealed that the model underwent training on a diverse dataset, leveraging licensed and publicly available data sources.

While Voice Engine represents a significant leap in AI text-to-audio capabilities, it also raises pertinent ethical considerations.

The technology's potential to replicate human voices has sparked discussions around safeguarding against misuse and has prompted regulatory bodies to take action.

Just recently, the US Federal Communications Commission cracked down on AI-driven robocalls, highlighting the need for responsible AI voice usage.

In response to these concerns, OpenAI has implemented stringent usage policies for Voice Engine's partners. These policies mandate obtaining explicit consent from original speakers, prohibiting impersonation without consent, and disclosing the AI-generated nature of the voices. OpenAI has also introduced watermarking to track audio clips and monitor their usage actively.

Looking ahead, OpenAI proposes proactive measures to mitigate risks associated with AI voice technologies. These include reevaluating voice-based authentication systems, implementing policies to protect individuals' voices, raising awareness about AI deep fakes, and developing robust tracking mechanisms for AI-generated content.

OpenAI announces voice cloning AI model but with limited access

Recent Posts