The difference becomes much clearer when you put it into a real product. Take ElevenLabs’ voice AI as an example. 𝟏. 𝐓𝐡𝐞 𝐛𝐚𝐬𝐞 𝐥𝐚𝐲𝐞𝐫: 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈 𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐲 At the first layer, ElevenLabs can turn text, scripts, voice references, or multilingual content into natural speech. For many products, this appears as a generative AI feature: AI narration in an education platform automatic voiceover in a video tool multilingual dubbing for content natural voice response in a support system Here, the value is mainly output quality. The system is generating voice, but it is not necessarily running a workflow. 𝟐. 𝐓𝐡𝐞 𝐦𝐢𝐝𝐝𝐥𝐞 𝐥𝐚𝐲𝐞𝐫: 𝐀𝐈 𝐯𝐨𝐢𝐜𝐞 𝐚𝐠𝐞𝐧𝐭 The next layer is when voice becomes interactive. A generated voice is not an agent. But a voice interface that can listen, understand intent, respond in context, ask follow-up questions, and manage a conversation starts to look much closer to one. This is where voice AI becomes more than audio generation. It becomes an interaction layer. The user is not just listening to generated speech. They are talking to a system that can handle a role inside a conversation. 𝟑. 𝐓𝐡𝐞 𝐡𝐢𝐠𝐡𝐞𝐫 𝐥𝐚𝐲𝐞𝐫: 𝐚𝐠𝐞𝐧𝐭𝐢𝐜 𝐀𝐈 𝐬𝐲𝐬𝐭𝐞𝐦 The more interesting layer appears when the voice agent is connected to real company systems. CRM. Support tickets. Calendars. Order databases. Knowledge bases. Payment tools. Internal APIs. Telephony stacks. Workflow automation tools. At that point, the system can do more than speak naturally. It can check an order, update a customer record, create a ticket, schedule a demo, trigger a follow-up, escalate to a human, or write the result of the conversation back into the system. In short: Generative AI creates the voice. An AI agent uses voice to interact. An agentic system connects that interaction to tools, data, permissions, and workflows. Explore more here https://lnkd.in/g57BYwHz *The chart is simplified, but it gives us a useful starting point to map these ideas to an actual product.
Implementing Voice Commerce
Explore top LinkedIn content from expert professionals.
-
-
𝗜𝗳 𝘆𝗼𝘂 𝗯𝘂𝗶𝗹𝗱 𝗔𝗜 𝘃𝗼𝗶𝗰𝗲 𝗮𝗴𝗲𝗻𝘁𝘀, 𝘆𝗼𝘂 𝗡𝗘𝗘𝗗 𝗧𝗢 𝗞𝗡𝗢𝗪 𝘁𝗵𝗶𝘀 𝘀𝗶𝘅-𝗹𝗮𝘆𝗲𝗿 𝘁𝗲𝗰𝗵 𝘀𝘁𝗮𝗰𝗸! 🛠️ AI voice agents are evolving fast, opening up many possibilities for a new paradigm of customer interaction. In today's world, businesses still use scripted IVR menus and static call flows that frustrate customers and waste time. With AI voice agents, we can create natural conversations that adapt in real-time, handling thousands of concurrent calls with low latency. There are many tools and possibilities for AI voice agents today, creating both exciting opportunities and a lot of noise. To cut through the confusion, here's a framework of six key tech stack layers you can leverage to build powerful, production-ready voice automation: Let's break it down: ⬇️ 1. 𝗩𝗼𝗶𝗰𝗲 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝗣𝗹𝗮𝘁𝗳𝗼𝗿𝗺 Start with Retell AI as your foundation. No-code builder + developer API, 30+ languages, 99.99% uptime. → Orchestrates STT, LLM, and TTS with sub-800ms latency for human-like conversations. 2. 𝗖𝗵𝗼𝗼𝘀𝗲 𝗬𝗼𝘂𝗿 𝗟𝗟𝗠 𝗕𝗿𝗮𝗶𝗻: Connect GPT-5 for complex reasoning, Gemini for long context, or custom models. The AI decides what to say, how to respond, and when to take action. → Think: the intelligence that powers every decision your agent makes. 3. 𝗔𝗱𝗱 𝗩𝗼𝗶𝗰𝗲 & 𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘁𝘆: Select TTS providers like ElevenLabs or Cartesia for natural voices. Clone your voice or choose from libraries, control speed, emotion, and tone. → This is what makes your agent sound human, not robotic. 4. 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗲 𝗧𝗼𝗼𝗹𝘀 & 𝗗𝗮𝘁𝗮: Connect calendars, CRMs and databases. Book appointments automatically, pull customer data during calls, update records in real-time. → Like giving your agent hands to actually do things, not just talk. 5. 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗲 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀: Use n8n, Make, or Zapier to connect agents to existing systems. Trigger actions during or after calls, send emails, create tickets, build complex automations. → Turns voice agents into full business process automation. 6. 𝗔𝗱𝗱 𝗧𝗲𝗹𝗲𝗽𝗵𝗼𝗻𝘆 & 𝗦𝗰𝗮𝗹𝗲: Connect phone numbers via Twilio, Telnyx, or Retell's built-in telephony. Handle inbound and outbound calls, manage routing, scale to hundreds of concurrent calls. → Most voice agents fail here — this is production deployment, not demos. Understanding this tech stack can improve deployment speed, reliability, and customer satisfaction, leading to more sophisticated and scalable AI voice automation. [𝗡𝗼𝘁𝗲 𝘁𝗵𝗮𝘁 𝘁𝗵𝗲𝘀𝗲 𝗹𝗮𝘆𝗲𝗿𝘀 𝘄𝗼𝗿𝗸 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿, 𝗻𝗼𝘁 𝗶𝗻 𝗶𝘀𝗼𝗹𝗮𝘁𝗶𝗼𝗻.] 🛠️ This tech stack is adapted from Retell AI's production deployment framework for building AI voice agents that actually work at scale. Save 💾 ➞ React 👍 ➞ Share ♻️ Build your first AI Voice Agent with Retell AI: https://lnkd.in/dgzuQrH5
-
Your B2B customers aren't at a desk. They're in trucks, on job sites, between meetings. They shouldn't fill out order forms. They should call. And they do call your sales rep. Voice still dominates B2B ordering. The problem? Your sales rep takes the order. Manually enters it. Takes 10 minutes. Costs you $15 in labor. Meanwhile, simple questions like "Where's my order?" "Can you change my shipping address?" burns rep time, while they should be focusing on creating complex quotas, advising top accounts, and closing new deals. Quick build-up tutorial based on recent client need: → Voice agent built at ElevenLabs handles intake → Draft order created in Medusa → Sales rep reviews, approves, and sends a quote → AI agent handles support questions like a change of address Zero rep time until the quote goes out. If your reps are still taking orders by hand, you're paying $15 for a task that costs <$0.1
-
Voice-first ordering just became real. Starbucks and Deepgram built a drive-thru prototype that handles 5+ modifications in pure chaos. Here's why this changes everything for quick-service restaurants: Drive-thru ordering is one of the most technically challenging environments for voice AI. You've got diesel engines running 6 feet from the microphone. Wind gusts hitting the speaker. Passengers shouting modifications from the back seat. These are the conditions that cause traditional voice systems to fail or require multiple repeats. The Starbucks prototype handles what breaks most voice AI: real-time complexity. The system processes every modification correctly. Then when the customer says "Actually, make that hot instead of iced" - it adjusts without restarting the conversation. This on-the-fly modification capability is what makes it revolutionary. Current drive-thru ordering breaks under pressure. During peak hours, staff juggle taking orders, handling payment, and coordinating with kitchen. This leads to order errors requiring remakes, frustrated customers leaving the line, and revenue loss when wait times exceed 5 minutes. Voice AI systems eliminate these bottlenecks through consistent throughput. The system processes orders at the same speed. It doesn't slow down, doesn't make more errors under pressure, and doesn't need breaks when volume spikes. But throughput is only part of the breakthrough. The system integrates with customer history and preferences. It remembers your usual order, suggests modifications based on past purchases, and handles loyalty programs without feeling transactional. Consumers now expect this everywhere. They want their preferences remembered, suggestions based on history, and zero friction at checkout - expectations that human-only drive-thrus struggle to meet consistently. The Starbucks prototype isn't an isolated experiment. Every quick-service restaurant faces identical operational challenges: noise interference, order complexity, peak demand pressure, and staff limitations during rush periods. The economics favor rapid adoption. Voice AI systems don't require breaks or call-outs. They maintain the same accuracy whether processing the first order of the day or the 500th. One system handles multiple concurrent orders across locations. The prototype proved this works in production conditions. But moving from prototype to scaled deployment requires infrastructure most companies lack: noise robustness that handles real chaos, real-time processing without latency, and integration that works across POS systems. This is what we built Voice.ai to solve. Our platform provides the noise robustness, cross-system integration, and scalable throughput that turns prototypes into production-ready voice ordering systems. If you're building in food service, retail, or high-throughput ordering environments, we should talk. If you're investing in companies tackling these problems, we should talk.
-
Swiggy just made typing optional. Speaking is the new scrolling. And India's next billion users are the reason why. On March 25, 2026, Swiggy announced a strategic partnership with Sarvam AI, an Indian sovereign AI company, to launch full voice-led commerce across its platform. Users can now order food, shop for groceries on Instamart, and book restaurant tables on Swiggy Dineout using simple voice commands across 11 Indian languages, including Hindi, Tamil, Telugu, Kannada, Bengali, and Marathi. Yahoo! Here is what makes this genuinely different: The entire journey, from product discovery to checkout and payment, is handled through conversation. No buttons. No menus. No typing. RT International Users can also place orders over a regular phone call, with no smartphone app or internet connection required, opening access to millions in low-connectivity and underserved regions. RT International Swiggy is also the first commerce platform to go live on Indus, Sarvam's AI-native chat application, with Razorpay powering the payment leg of the transaction. Middle East Eye Swiggy CTO Madhusudhan Rao said it best: "True accessibility means meeting users where they are, in the languages they speak." The business case here is enormous. India has 22 officially recognised languages. Over 900 million internet users, as per TRAI data. Yet most digital commerce still defaults to English. Swiggy's move aims to onboard millions of new users by shifting from screen-based interactions to voice-first experiences, redefining how consumers interact with digital services in India. UNITED24 Media This is not a feature update. This is a distribution strategy. Whoever wins the voice layer wins the next wave of Indian commerce. Are you building for the user who types, or the one who talks? Follow me for more breakdowns on India's tech and business ecosystem. #Swiggy #SarvamAI #VoiceCommerce #AIIndia #IndianStartups #Instamart #Dineout #ConversationalAI #DigitalIndia #LinkedInIndia #TechIndia #FutureOfCommerce
-
In the fast-food industry, speed and adapting to culture and technology are crucial. This week's M7 Unlock explores Taco Bell's use of Voice AI in drive-thrus, enhancing operational efficiency and customer satisfaction. This initiative streamlines ordering, allowing staff to focus on higher-value tasks, and showcasing the power of innovative technologies for growth. ⚙️ Efficiency and Experience: Taco Bell’s Voice AI in drive-thrus automates order-taking, cutting down on repetitive tasks. It speeds up service and ensures accurate orders, boosting customer satisfaction. Smart resource management and ongoing AI refinement are crucial, demonstrating how tech integration drives growth and competitive advantage. 📈 Scalable Innovation: Taco Bell's Voice AI rollout, with global expansion plans, highlights the importance of scalable tech. Early deployments provide data for continuous improvement. This 'learning by doing' method ensures the AI evolves to meet diverse needs, placing Taco Bell at the forefront of digital transformation in fast food. ⚒️ Proprietary Tech Power: Taco Bell's investment in proprietary systems like Voice AI shows a deep understanding of innovation. Collaborating with franchisees ensures benefits for both staff and customers. Integrating AI with digital infrastructure optimizes operations and enhances customer experience, positioning Taco Bell as a fast-food leader. 💡Brand Lessons: Taco Bell’s approach offers key insights: 1. Employee Engagement: Automating routine tasks lets employees focus on value-added activities, boosting job satisfaction and service. 2. Resource Allocation: Reallocate time saved through automation to maximize efficiency. 3. Feedback Integration: Continuous feedback from customers and franchisees ensures technology aligns with needs, driving successful adoption and high service standards. Lawrence Kim
-
𝗦𝗾𝘂𝗮𝗿𝗲 𝗷𝘂𝘀𝘁 𝗴𝗮𝘃𝗲 𝗿𝗲𝘀𝘁𝗮𝘂𝗿𝗮𝗻𝘁𝘀 𝗮 𝟮𝟰/𝟳 𝗽𝗵𝗼𝗻𝗲 𝗵𝗼𝘀𝘁 — 𝗽𝗼𝘄𝗲𝗿𝗲𝗱 𝗯𝘆 𝗔𝗜 🍔📞 Square (Block) rolled out 𝗔𝗜-𝗽𝗼𝘄𝗲𝗿𝗲𝗱 𝘃𝗼𝗶𝗰𝗲 𝗼𝗿𝗱𝗲𝗿𝗶𝗻𝗴 so restaurants don’t miss another call at peak rush. The bot answers the phone, understands menu questions (“𝘞𝘩𝘢𝘵’𝘴 𝘨𝘭𝘶𝘵𝘦𝘯-𝘧𝘳𝘦𝘦?” “𝘌𝘹𝘵𝘳𝘢 𝘴𝘱𝘪𝘤𝘺, 𝘯𝘰 𝘥𝘢𝘪𝘳𝘺”), takes the order, and injects it straight into POS/kitchen – no staff juggling, no sticky notes. Early rollouts land alongside upgrades to Square AI (their conversational assistant) and an 𝗶𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗲𝗱 𝗕𝗶𝘁𝗰𝗼𝗶𝗻 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝘀𝗲𝗹𝗹𝗲𝗿𝘀. — 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 • 𝗡𝗲𝘃𝗲𝗿 𝗺𝗶𝘀𝘀 𝗿𝗲𝘃𝗲𝗻𝘂𝗲: Every call gets answered, even during dinner rush or short staffing. • 𝗛𝗶𝗴𝗵𝗲𝗿 𝗼𝗿𝗱𝗲𝗿 𝗾𝘂𝗮𝗹𝗶𝘁𝘆: The system can confirm modifiers, allergens, pricing, and promos before firing to the line. • 𝗖𝗹𝗲𝗮𝗻𝗲𝗿 𝗼𝗽𝘀: Orders land in the Square POS and kitchen display with audit trails; no rekeying = fewer errors. • 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝗲𝗱 𝘁𝗿𝗲𝗻𝗱: Voice AI is spreading across QSR – the winners pair accuracy with tight POS integration. Square’s move brings that to independents and multi-unit locals. 𝗪𝗵𝗮𝘁 𝗲𝗹𝘀𝗲 𝘀𝗵𝗶𝗽𝗽𝗲𝗱 – Square AI gains deeper “𝗻𝗲𝗶𝗴𝗵𝗯𝗼𝗿𝗵𝗼𝗼𝗱 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀” (weather, events, reviews) to help with staffing and menus. – Square Bitcoin lets U.S. sellers 𝗮𝗰𝗰𝗲𝗽𝘁 𝗕𝗧𝗖 𝗮𝗻𝗱 𝗲𝘃𝗲𝗻 𝗰𝗼𝗻𝘃𝗲𝗿𝘁 𝗰𝗮𝗿𝗱 𝘀𝗮𝗹𝗲𝘀 𝘁𝗼 𝗯𝗶𝘁𝗰𝗼𝗶𝗻 inside Square, with fee-free promos at launch. 𝗪𝗵𝗮𝘁 𝗜’𝗹𝗹 𝘄𝗮𝘁𝗰𝗵: accuracy in noisy environments, smart upsells (add sides/drinks without being pushy), and real-world impact on 𝗽𝗵𝗼𝗻𝗲-𝗮𝗻𝘀𝘄𝗲𝗿 𝗿𝗮𝘁𝗲, 𝗯𝗮𝘀𝗸𝗲𝘁 𝘀𝗶𝘇𝗲, 𝗮𝗻𝗱 𝗿𝗲𝗺𝗮𝗸𝗲 𝗰𝗼𝘀𝘁𝘀. If those move, voice AI becomes a no-brainer line item rather than a lab experiment. — Would you let an AI take your restaurant’s phone orders during peak hours? Why or why not? P.S. I’m continuing this theme on my Substack – deep dive here: https://lnkd.in/eG7TbJbJ #Square #Block #Restaurants #VoiceAI #POS #HospitalityTech #OrderAhead #Fintech #Bitcoin #QSR #CustomerExperience #Automation
-
What does it take to build an AI that can take a food order over WhatsApp — correctly, every time, fast enough that customers can't tell it's not a person? That's the core challenge Santi Marchiori and Juan Haedo set out to solve at AITropos, a company building AI employees for the hospitality industry. In this episode of Just Now Possible, Teresa Torres talks with Santi Marchiori (CEO) and Juan Haedo (CTO) of AITropos about how they built an AI order-taking agent that handles the full flow — menu recommendations, modifiers, delivery zones, payment links, and status updates — entirely inside WhatsApp. They went through three product iterations to get there: first a hardware device for waiters, then a waiter-facing app, and finally a customer-facing conversational agent powered by a tools-based architecture designed for speed and reliability. You'll hear how they solved the core technical challenge of translating non-deterministic human conversation into structured POS-compatible order data, why they chose tools over MCP for agent architecture, how they pre-inject product context to cut latency before the agent ever makes a tool call, and why they test with thousands of agent-simulated customer conversations overnight before deploying to any real venue. Guests: - Santi Marchiori – CEO, AITropos - Juan Haedo – CTO, AITropos You'll hear how they: - Spent two years exploring hundreds of startup ideas before finding the specific niche of AI-powered order taking in hospitality - Went through three product iterations — hardware for waiters, a waiter app, and finally a customer-facing WhatsApp agent — before landing on the right form factor - Identified order item identification accuracy as their single most important KPI - Chose a tools-based agent architecture over MCP or pipelines to hit real-time response speed requirements - Built a parallelized pipeline that searches for multiple products simultaneously and pre-fetches product context before the agent even calls a tool - Use smaller, fast sub-agents to build an "immediate system prompt" that injects relevant data into each turn without extra tool calls - Test with thousands of agent-simulated customer conversations run overnight before deploying to new venues - Reduced new customer onboarding from three months to a few weeks — and continue to shrink it as they build domain templates Resources & Links: - AITropos: https://buff.ly/gnPl3Ug 00:00 Meet the Founders 00:59 What AITropos Builds 01:51 AI vs Human Touch 06:17 Restaurant Use Cases 08:16 Why Hospitality 10:47 Finding the Wedge 16:00 Early Prototypes 16:46 Hard Parts of Ordering 18:03 Speed and Channels 21:15 Iteration and Model Jumps The rest of the Chapters are in comments. Listen on Spotify, Apple Podcasts, or watch on YouTube. Spotify: https://buff.ly/0mWMydk Apple Podcast: https://buff.ly/vtbilxq YouTube: https://buff.ly/0hcGIJz
-
Built an entire voice AI system for a client. They ghosted me but the system still works. So let me show you what I built. It's a fully autonomous phone agent for a restaurant in New York. Not a chatbot. Not "press 1 for orders." A reasoning system that handles a real phone call end to end: -> Fetches the live menu from POS before responding -> Looks up or creates the customer mid-call -> Takes the full order with modifiers, including "actually change that" -> Pushes directly to POS. Zero re-entry. -> Sends a payment link via SMS before the call end -> Books reservations against live calendar availability One orchestrator. Five sub-workflows. Each one handling a different intent. The orchestrator doesn't know what kind of call is coming. It just routes based on what the caller actually needs. That architecture isn't just for restaurants. Swap the sub-workflows: dental clinic, law firm, salon, HVAC company, real estate agency. The reasoning layer stays identical. The client didn't move forward. But this system is ready for any service business taking calls right now. Getting ghosted after shipping is part of this. What's your worst client story? #BuildInPublic #VoiceAI #AIEngineering #n8n #LLMOrchestration #SystemsDesign #AIAgents
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development