Microsoft is introducing a new artificial intelligence (AI) feature in Teams called the “Interpreter,” to clone user voices for real-time speech-to-speech translation. Launching in early 2025, this feature aims to break down language barriers in meetings and positions Microsoft to compete in the growing field of AI translation tools.
AI Translation Tools Battle it Out
Alongside other Copilot announcements, the Interpreter was announced at the annual Ignite event. The feature can replicate a speaker’s voice, translating up to nine languages. This technology will let users maintain the quirks and features of their original voice, enhancing the quality of virtual interactions. Languages initially supported include English, Spanish, French, and Mandarin Chinese.
“Imagine being able to sound just like you in a different language,” wrote Jared Spataro, Microsoft’s Chief Marketing Officer.”The Interpreter agent in Teams provides real-time speech-to-speech translation during meetings, and you can opt to have it simulate your speaking voice for a more personal and engaging experience.”
Microsoft’s new offering will join a competitive market alongside Google, Meta, and Apple. Google’s Translate and Live Transcribe services are well-known for their extensive language support and real-time capabilities. But Google video conferencing tools have also been integrating translation, though without the personalization aspect the way Microsoft has. On the other hand, Meta has been working on real-time voice translation for platforms like Instagram Reels. Apple is also keeping up with its Personal Voice feature, which allows users to create a synthetic version of their voice for live speech-to-text situations like FaceTime.
Voice Cloning Challenged by Ethics
While innovative, Microsoft’s entry into this space faces several challenges. Experts in AI ethics are cautious about the impact of this type of technology on privacy and security, given the potential for misuse. Deepfake technology has already shown how easily it can be exploited. Incidents of cloned voices used in scams and disinformation campaigns have become more common.
Victims of deepfake technology in 2024 have included several high-profile individuals. Taylor Swift faced explicit AI-generated images that spread widely across social media. Additionally, a private school in Pennsylvania dealt with a similar scandal, which led to legal action.
Microsoft has emphasized that the tool will only be enabled with user consent and that it doesn’t store biometric data. Nevertheless, concerns about bad actors using cloned voices for deception persist.
A Microsoft spokesperson reassured users, stating, “Interpreter is designed to replicate the speaker’s message as faithfully as possible without adding assumptions or extraneous information. Voice simulation can only be enabled when users provide consent via a notification during the meeting or by enabling ‘Voice simulation consent’ in settings.”
Microsoft has confirmed that only Microsoft 365 subscribers can access this tool.
The competitive landscape of AI translation is rapidly evolving, with each company bringing its unique approach to breaking language barriers. Microsoft’s Interpreter seeks to carve out a niche by combining personalization with translation, but the company must also tackle privacy and security issues that are inherent in AI voice cloning technology.