Browser / Talk mode
Use the Control UI or browser WebRTC path to speak into the agent without carrier setup. Best for testing GPT-5.5's voice reasoning before phone plumbing enters the picture.
Browser Talk is the fastest proof. Twilio realtime is the best phone-call demo. Telnyx and Plivo are production carrier alternatives. Mock mode keeps the wiring safe before you spend a dime.
Built from OpenClaw browser, Control UI, and Voice Call plugin docs — translated into an operator-friendly decision page.
If we want to test GPT-5.5 properly, we should not start with the hardest phone infrastructure first. Start with the fastest voice loop, then graduate into the real carrier path once the conversation quality is worth exposing publicly.
The wrong mistake is treating every voice path like the same thing. Some are for fast demos. Some are for real calls. Some are for production carrier comparison. One is just there so we can test safely.
Use the Control UI or browser WebRTC path to speak into the agent without carrier setup. Best for testing GPT-5.5's voice reasoning before phone plumbing enters the picture.
Use the Voice Call plugin with Twilio Media Streams for real outbound or inbound phone calls. This is the most convincing “AI Employee on the phone” test.
A strong telephony alternative if we want carrier redundancy or production comparison. Requires connection and public-key setup, so it is less ideal for the first demo.
A practical carrier option for voice flows and XML-style call handling. Useful as a fallback path, though not the first choice for the flashy realtime benchmark.
Use mock mode when we need to verify command flow, config wiring, and readiness checks without placing a real call or spending money.
OpenClaw can use realtime providers like OpenAI or Google Gemini Live for the spoken loop, then call openclaw_agent_consult when deeper tool work is needed.
Voice is not one model doing magic. It is a chain: audio capture, realtime speech loop, optional agent consult, tools, and a spoken answer.
User speaks in the Control UI / browser session.
OpenAI Realtime or another provider handles the live audio loop.
The realtime model can ask the main agent for deeper reasoning.
The main agent can use approved docs, web, memory, and files.
The caller hears a concise voice response.
Best for: latency, interruption handling, answer quality, and tool-consult behavior.
A real inbound or outbound call starts through Twilio.
Twilio reaches the OpenClaw Voice Call webhook.
Audio streams into the Gateway-hosted plugin.
The voice model can consult the main OpenClaw agent.
The caller experiences an actual AI Employee phone call.
Best for: public proof, carrier reliability, real-world audio, and phone-call UX.
Pick the path based on the job. A quick GPT-5.5 benchmark does not need the same infrastructure as a production phone agent.
| Option | Best for | Setup friction | What it proves | Risk / tradeoff |
|---|---|---|---|---|
| Browser / Talk mode | Fast internal demo and model-quality test | Low | Voice reasoning, latency, interruptions, tool consult | Not a real phone call |
| Twilio realtime | Public “AI Employee on the phone” demo | Medium / High | Carrier audio, webhook reliability, full voice-call experience | Needs credentials, public webhook, and call costs |
| Telnyx | Production carrier comparison | High | Carrier redundancy and alternate call infrastructure | More setup ceremony for first benchmark |
| Plivo | Fallback voice API option | Medium | Another carrier path for voice automation | Less ideal for the most polished realtime demo |
| Mock | Safe config and command testing | Very low | Plugin wiring and readiness flow | Does not prove real audio or caller experience |
This is where the model either behaves like a calm operator or like a novelty demo.
Verify the plugin command path and configuration expectations without placing a call.
Test GPT-5.5's live voice behavior and consult judgment in the fastest possible loop.
Run a dry smoke check, then a short notify call once credentials and webhook exposure are clean.
Run the actual spoken AI Employee call and score it against latency, recovery, and usefulness.
Browser profiles, snapshots, screenshots, and browser control surface.
Talk mode and browser WebRTC realtime voice sessions.
Twilio, Telnyx, Plivo, mock provider, streaming, realtime, and openclaw_agent_consult.
Setup, smoke, call, continue, DTMF, status, and end commands.
The point is not to make a novelty phone bot. The point is to test whether GPT-5.5, inside OpenClaw, can listen, reason, consult tools, respect boundaries, and help a human without needing a keyboard.