ChatGPT Voice mode: talking to AI like a friend
Talking to AI feels strange for about ninety seconds, then it becomes the most natural interface there is. A practical guide to voice mode — what it is great at, what it is bad at, and how to actually use it.
Most people who try ChatGPT's voice mode for the first time give up after thirty seconds because they feel slightly silly. This is reasonable — talking out loud to your phone feels strange the first time, especially in public. But voice mode is one of the most underused features in modern AI, and once you find the situations where it is actually better than typing, you stop thinking of it as a gimmick.
This article is a quick tour of what voice mode is good for, what it is bad at, and how to use it without feeling like a futurist.
How to find it
In the ChatGPT app — iOS and Android — there is a headphones-style icon near the input box. Tap it. You will see two voice options on most current versions:
- Standard voice: the basic talk-and-respond mode. Reliably in the mobile apps; on the web, voice support depends on your account and the current product UI.
- Advanced voice (sometimes branded "Realtime" or similar): the conversational version that handles interruptions, tone, and feels much more natural. Available on Plus and Pro tiers.
Claude, Gemini, and Grok all now have similar conversational voice modes. The Apple Intelligence voice and Google Gemini Live versions are also strong. The advice in this article applies to all of them, with slightly different button layouts.
Why voice changes how you use AI
A keyboard forces you to compose. You think about exactly what you want, type it, read it back, send it. Voice is different. You think out loud. You ramble. You backtrack. You say things like "actually wait, also..."
This turns out to be the right interface for a specific kind of task. When you are still figuring out what you mean, the keyboard slows you down. Voice meets you where you are. Some examples where this matters:
Thinking out loud about a decision. "Okay so I'm trying to decide whether to take the offer. I'm worried about the commute but I think the work is more interesting. Also my partner mentioned... actually, can you ask me questions instead of just listening?" The conversation flows in a way that typing would not.
Working through a problem while walking. Standing up, moving, talking it through — there is research on why this works better than sitting at a screen. Voice mode lets you do it while still having a thinking partner.
Learning a language. Practicing conversation in Estonian, German, Spanish, Japanese — anything where you need to speak, not just read. The model is happy to roleplay as a native speaker, correct your pronunciation, and slow down when you ask.
Hands-free moments. Cooking, driving, walking the dog, exercising. The keyboard is not an option; voice is the only option.
Quick capture. "Make a note that I need to follow up with Anna on the proposal next Tuesday, and remind me to ask about the budget question." Faster than typing it into a notes app.
The mental model: talk to it like a smart friend on the phone
If you find voice mode awkward, the fix is almost always to stop thinking of it as a search bar with audio and start thinking of it as a phone call with a smart, well-read friend. You can:
- Interrupt it. ("Hang on, that's not what I meant.")
- Ramble. ("Okay so, the thing is, my situation is kind of complicated...")
- Course-correct mid-sentence. ("Actually, ignore that, let me start again.")
- Ask follow-up questions naturally. ("Wait, what did you mean by that?")
- Pause and think. ("Hmm, give me a second.")
Try this once. Open voice mode and just say something like: "I had a weird interaction with a colleague today and I want to think about how to handle it. Can you listen for a minute and then ask me three questions?" Then talk for a couple of minutes the way you would to a friend. The model will pick up the thread and ask useful questions back. It is genuinely surprising the first time.
What voice is bad at
Honest list:
Anything visual. If the task involves looking at a document, a chart, a screen, voice is the wrong tool. You will be tempted to describe what you are looking at, and the description always loses information. Use the camera or upload mode instead.
Precise written output. If you want the model to draft you an email, a slide, a memo, you can ask in voice — but you will want the output to appear as text you can copy. Most voice modes can do both (speak the answer and show it on screen), but voice is rarely the fastest way to produce written work.
Anything where you need to read along. Complex explanations, lists of options, anything with numbers or steps. Voice is poor at lists. Five bullet points spoken aloud is much harder to retain than five bullet points read.
Public places where you would not be on a phone call. This is more social than technical, but it matters. In a quiet office or on a packed bus, voice mode reads as rude — same way a phone call would. Pop in an earpiece and it stops being a problem.
Tasks that need precision in your input. Saying a code snippet, an exact name, a long URL, or a complicated address out loud is a recipe for frustration. Type those.
A few useful patterns
Once voice feels natural, a small set of patterns will cover most of your real use:
The walking think-tank. Twenty minutes outside, voice mode on, working through a single hard question. Sometimes a real problem, sometimes a draft of something, sometimes "explain X to me as if I have a layperson's curiosity." Many people who do this find it replaces a meeting with themselves they used to spend an hour avoiding.
The morning briefing. Open voice mode in the morning and have the model give you a five-minute briefing on the day. "Today is Tuesday. I have three meetings at 10, 2, and 4. My main priorities are X, Y, Z. Talk me through how to prepare for the day in five minutes." Works particularly well in combination with a calendar integration if you have one.
The language-immersion partner. "Have a conversation with me in Estonian. I am at intermediate level. Use simple grammar but vary your vocabulary. Correct me when I say something clearly wrong, but do not interrupt me to do it — wait until I finish each sentence." This is a real skill builder, and unlike a tutor it has infinite patience.
The interview prep partner. "You are interviewing me for a senior product manager job at a Series B startup. Ask me one realistic question at a time. Push back if my answer is too generic. Do not give me feedback until the end of the interview, then tell me which two answers I should improve and how." Hard to overstate how useful this is the night before a real interview.
The journaling prompt. "Ask me three questions about how my week is going. Don't give me advice. Just listen and ask follow-ups." Five minutes a week of this is a remarkably good check-in habit.
A final small note
You do not have to talk like you are dictating a memo. The whole point is that you can ramble, restart, contradict yourself, and ask the model to make sense of it. The model is fine with disfluency. Many users report that the conversations they have with AI in voice mode end up being more useful than the typed ones, because they thought less about crafting the prompt and more about what they actually wanted to figure out.
If voice mode has felt weird to you so far, try this once: tomorrow morning, on the way to wherever you are going, put in your earphones and ask the model to help you think through one thing on your mind. Walk for ten minutes. By the time you arrive, you will have either solved it or sharpened it. Either is a win, and the only thing that just changed is the interface.