Publisher’s Note: This column is the latest in a series by Don Lenihan exploring the issues around the use of AI, including the social, economic and governance implications. To see earlier instalments in the series, click here.
Speech sets humans apart from other creatures—or it did, until chatbots came along. At first, they spoke only in text, so “real speech” remained distinctively human. Not anymore. Over the summer chatbots got human voices—and the chattering classes may never be the same. What can we expect?
Voice Capacity Arrives
Voice capacity is a new AI feature that lets chatbots speak like people—and it’s impressive. The AI’s voice is emotionally expressive, with human-like cadences, pauses, and even natural "ums" and "ahs." If this seems eerie at first, stranger still is how quickly it starts to feel normal. Humans are hardwired to respond to voices, and we adjust quickly. The extraordinary soon feels ordinary.
Except, it’s not. Voice changes the relationship between AI and humans, creating a new—and highly disruptive—level of intimacy, especially for professions that rely on the spoken word. There’ll be wins and losses, starting with language instruction.
Language Instructors, Adieu!
For language instructors, the next few years could be a bittersweet farewell. Chatbots like ChatGPT and Gemini are fluent in dozens of languages. They can flit between English, French, Mandarin, or Hindi and use idioms like a native speaker. They are experts in grammar and syntax, can customize lessons, and are available 24/7 at a fraction of the cost of a real tutor.
Language instruction is big business. An estimated 1.5 billion people worldwide are actively learning a second language. The US alone employs a million ESL teachers, many of whom could be supported or even replaced by a chatbot. The question here isn’t whether Voice will sweep the industry, but how fast—and at what cost?
A Pocket Translator
Traveling in foreign countries often means carrying around phrase books. Not anymore. Voice allows chatbots to listen to a foreign speaker, translate their words, and respond in real-time. Travellers can roam anywhere, confident that a reliable translator is always just a reach away.
Virtual Meetings in Everyone’s Language
Voice is also transforming virtual meetings. Meta has unveiled Virtual Reality goggles that create a lifelike 3-D meeting room, complete with avatars and real-time translation. As people engage, the AI translates and syncs their voices with their avatars. Participants may be on different continents, but everyone experiences the meeting as if it were face-to-face and in their own language.
Tools like this promise to eliminate much business travel, as well as many international meetings and even conferences. But what about the impact on industries like travel and hospitality—and the local economies that rely on them?
A Tutor for Every Child
AI Voice could transform education. It takes us a giant step closer to Bill Gates’ vision of providing each student with a personal tutor, whose instruction is tailored to their individual learning needs. The Gates Foundation is already experimenting with AI chatbots, like Khanmigo from Khan Academy, acting as tutors, offering feedback on essays or guiding students through math problems.
Voice capacity will greatly enhance the progress, making high-quality, personalized education potentially accessible to all, giving every student an opportunity to follow their own learning path. This certainly looks like a win.
Podcasts-a-Plenty
Voice is also making waves in digital media. Preparing a professional podcast normally takes days, but Google’s NotebookLM can generate one in minutes. Users simply upload content on any topic, from scientific papers to meeting transcripts. The AI organizes, edits, and consolidates the material, writes the script, and creates the voices that deliver it.
The quality is striking—the AI-generated hosts sound like skilled professionals whose thoughtful, informative exchanges have earned rave reviews.
From Audio to Video
Voice can also be paired with AI’s video capabilities to create avatars to anchor talk shows, participate in debates, or narrate documentaries. Many believe AI-generated Voice and Video herald a new era in educational products.
Customer Service that Actually Serves
Today’s service agents are notorious for their monotone voices and inability to understand natural speech, but Voice is already changing this. Well-known agents like Alexa and Google Assistant are undergoing upgrades. The new versions will converse naturally, providing a more human and empathetic experience that raises automated services to a new level.
One Small Step for AI, One Giant Leap for…
We said that Voice changes the relationship between humans and AI, but that doesn’t go far enough. AI speech paves the way for autonomous AI agents—chatbots capable of tackling complex tasks on their own.
Sam Altman, OpenAI’s CEO, predicts these AI agents could arrive by next year. We may be starting with a smarter, more human Alexa, but this could quickly lead to expert AIs reshaping how we plan and manage our affairs, and ultimately, our economy and society.
There’s much to learn about how these agents will impact jobs, social interactions, and even our sense of identity. And what role will AI play in helping us find solutions to the disruptions it brings? Voice provides the interface we need to manage this new relationship—a tool that lets humans collaborate with AI to find answers.
So, yes, we were wrong: speech is not what distinguishes humans from other creatures, but rather what separates intelligence from instinct. And that circle just got a lot bigger.
Don Lenihan PhD is an expert in public engagement with a long-standing focus on how digital technologies are transforming societies, governments, and governance. This column appears weekly.