Sesame’s AI voices are so realistic, they’re freaking people out

Written by

Published 3 Mar 2025

Fact checked by

NSFW AI Why trust Greenbot

We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

Disclosure

Free android artificial intelligence robot illustration

Sesame demonstrated breakthrough artificial intelligence(AI) voice technology with unprecedented realism last week. Their new voice AI achieves such human-like qualities that users report feeling “freaked out” during interactions.

“I was so freaked out by talking to this AI that I had to leave,” PCWorld’s Mark Hachman wrote after testing the system. The AI’s voice reminded him of an old friend, triggering unexpected emotional discomfort.

    The San Francisco-based startup announced its Conversational Speech Model on February 27, 2025. Two AI personas – Maya and Miles – are publicly available for demo in the company’s research blog.

    Sesame, founded by Oculus co-founder Brendan Iribe, designed its model to achieve what they call “voice presence.” The company trained the system on roughly one million hours of English audio data. They developed three model sizes: Tiny (1B backbone, 100M decoder), Small (3B backbone, 250M decoder), and Medium (8B backbone, 300M decoder).

    The team focused on four core components when developing the AI. This includes emotional intelligence, conversational dynamics, contextual awareness, and consistent personality. “We believe in a future where computers are lifelike. They will see, hear, and collaborate with us the way we’re used to. A natural human voice is key to unlocking this future,” Sesame states on its website.

    The company plans to pair this voice technology with lightweight AI glasses. These would provide “convenient access to your companion who can observe the world alongside you,” according to Sesame. This, however, raises both technological excitement and privacy questions as the glasses would essentially create an always-present AI observer in users’ daily lives.

    Sesame's early AI glasses prototype

    Source: Sesame

    Sean Hollister at The Verge called it “the first voice assistant I’ve ever wanted to talk to more than once.” Another user, @leeoxiang, reported practicing English with the system for thirty minutes without noticing any delays or artificial patterns.

    Shopify CEO Tobi Lutke publicly endorsed the technology: “Man, sesame’s voice model is absolutely insane.” Multiple industry executives have expressed similar sentiments across social platforms.

    The technology’s potential applications extend beyond consumer products. Call centers could leverage such advancements to enhance customer experiences. Teleperformance SE, the world’s largest call center operator, recently implemented AI to modify accents of English-speaking Indian workers in real time.

    Despite the impressive performance, the system only supports English for now. Its performance also varies in specialized contexts like language switching or singing. Sesame plans to open-source key components under an Apache 2.0 license and expand support to over 20 languages.

    Industry competition in voice AI is intensifying. Eleven Labs, valued at $4 billion, offers text-to-voice features. OpenAI and Grok have developed increasingly human-sounding voice assistants. Sesame’s breakthrough suggests voice-centric interfaces may define the next wave of human-computer interaction – for better or worse.