A comprehensive guide to all 9 voice agents in the HiyaDrive orchestration system.
HiyaDrive uses a 9-agent state machine powered by LangGraph, where each agent handles one step of the booking workflow. Every agent is voice-enabled with dynamic message generation.
flowchart TD
START(["Start session"])
GREET["Welcome Agent\nLLM generates greeting\nTTS speaks to user"]
A["Intent Parser Agent\nLLM extracts booking details\nTTS confirms extracted information\nListens for user yes/no"]
B["Calendar Checker Agent\nGoogle Calendar checks availability\nLLM generates message\nHandles retry if busy"]
C["Service Searcher Agent\nGoogle Places API search\nLLM announces results\nTTS presents options"]
D["Service Selector Agent\nSelects highest-rated option\nLLM generates message\nTTS confirms selection"]
E["Call Scripter Agent\nLLM generates opening script\nTTS speaks script preview\nListens for approval"]
F["Call Initiator Agent\nTwilio makes call\nLLM announces connection\nTTS confirms connected"]
G["Conversationalist Agent\nSimulates conversation\nLLM generates responses\nExtracts confirmation"]
H["Booking Finalizer Agent\nLLM generates summary\nTTS speaks confirmation\nSaves to calendar"]
GOODBYE["Goodbye Agent\nAsks for additional help\nEnds session gracefully"]
ERR["Error Handler Agent\nClassifies error type\nAttempts recovery"]
END_OK(["End\nBooking Complete"])
END_FAIL(["End\nBooking Failed"])
START --> GREET --> A
A -->|user confirms| B --> C --> D --> E --> F --> G --> H --> GOODBYE --> END_OK
A -->|user says no| A
D -->|user rejects selection| C
E -->|user rejects script| D
A -->|error| ERR
B -->|error| ERR
E -->|error| ERR
F -->|error| ERR
G -->|error| ERR
ERR -->|retry| B
ERR -->|fail| END_FAIL
style START fill:#e8f5e9
style GREET fill:#fff9c4
style A fill:#fff9c4
style B fill:#ffe0b2
style C fill:#ffccbc
style D fill:#ffccbc
style E fill:#f8bbd0
style F fill:#e1bee7
style G fill:#e1bee7
style H fill:#c8e6c9
style GOODBYE fill:#b3e5fc
style ERR fill:#ffcdd2
style END_OK fill:#c8e6c9
style END_FAIL fill:#ffcdd2
Purpose: Generate a warm greeting and set conversational tone.
What it does:
- Generates unique greeting using Claude LLM
- Synthesizes greeting with ElevenLabs TTS
- Plays through speaker to greet driver
- Sets friendly, professional tone
Input: None (start of session)
Output: Greeting message spoken to user
Example:
System: "Hi there! I'm HiyaDrive, your voice booking assistant.
What service can I help you book today?"
Technology:
- Claude LLM for dynamic greeting generation
- ElevenLabs TTS for natural voice synthesis
- Ensures zero hardcoded strings
Next Agent: Intent Parser Agent
Purpose: Extract booking details from user's spoken request.
What it does:
- Listens to user's booking request
- ElevenLabs STT transcribes speech to text
- Claude LLM extracts structured intent:
- Service type (haircut, dinner, appointment, etc.)
- Party size (1-10 people)
- Requested date/time
- Location (optional)
- Generates confirmation message
- TTS speaks confirmation
- Listens for user confirmation ("yes" or "no")
Input: User voice input
Output:
- Intent object with extracted details
- User confirmation (yes/no/restart)
Example:
User: "I want to book a table for 2 people at an Italian restaurant next Friday at 7 PM"
Extracted Intent:
{
"service_type": "italian restaurant",
"party_size": 2,
"requested_time": "2024-11-22 19:00",
"location": "current location"
}
System: "So you want to book a table for 2 people at an Italian restaurant
next Friday at 7 PM. Is that correct?"
User: "Yes"
Technology:
- ElevenLabs STT for transcription
- Claude LLM for intent extraction with structured output
- Natural language processing with fallback for unclear requests
Error Handling:
- Transcription fails: Ask user to repeat
- Intent unclear: Ask clarifying questions
- User says "no": Restart from Welcome Agent
Next Agent: Calendar Checker Agent (if confirmed)
Purpose: Check driver's calendar availability and handle conflicts intelligently.
What it does:
- Checks Google Calendar for conflicts at requested time
- If available: Announces availability and proceeds to Service Searcher
- If busy (Key feature):
- Asks driver for alternative time (up to 3 retries)
- Example: "You're busy at that time. What time would work better?"
- Listens for new time
- Re-checks calendar with new time
- Repeats until available slot found or max retries reached
Input:
- Requested time from Intent Parser
- Current session state
Output:
- Availability status (free/busy)
- Confirmed alternative time (if initially busy)
Example Flow:
Check 1: Friday 7 PM --> BUSY (already has dinner)
Ask: "You're busy at 7 PM. What time works better?"
User: "8 PM"
Check 2: Friday 8 PM --> FREE
Continue to Service Search
Technology:
- Google Calendar API v3 with service account auth
- RFC 3339 timestamp formatting
- Natural language date parsing with dateparser
- Timezone-aware datetime handling
Configuration:
- Max retry attempts: 3
- Max duration for booking: 120 minutes
Key Features:
- Handles natural language dates ("next Friday", "tomorrow at 3 PM")
- Fallback logic for ambiguous dates
- Graceful failure after max retries
Error Handling:
- Calendar API unavailable: Proceed with assumption driver is available
- Cannot parse new time: Ask for clarification
- Max retries reached: End booking with explanation
Next Agent: Service Searcher Agent (if available)
Purpose: Find matching services/businesses using Google Places API.
What it does:
- Constructs search query from intent
- Service type + location
- Example: "Italian restaurant near current location"
- Calls Google Places API
- Receives list of businesses with ratings
- Ranks by rating and distance
- Generates announcement message
- TTS announces results
Input:
- Service type from Intent
- Location (current or specified)
Output:
- Ranked list of top options
- Business details (name, rating, address, phone)
Example:
Search: "Haircut near San Francisco"
Results:
1. StyleCuts (4.8 stars, 0.2 km) - (415) 555-0101
2. Clips & Cuts (4.6 stars, 0.4 km) - (415) 555-0102
3. Hair Studio (4.4 stars, 0.5 km) - (415) 555-0103
System: "I found 3 salons near you. StyleCuts has the highest rating
at 4.8 stars and is just a few blocks away."
Technology:
- Google Places API for business search
- Ranking algorithm (rating weighted more than distance)
- Claude LLM for announcement generation
Features:
- Works with any business type (restaurants, salons, doctors, services)
- Returns comprehensive business info
- Includes ratings and contact details
Error Handling:
- No results found: Ask to search wider area or try different service type
- API unavailable: Explain and offer manual booking alternative
Next Agent: Service Selector Agent
Purpose: Let user choose or accept top service option.
What it does:
- Presents top 3 options with details
- Name, rating, distance, address
- Asks driver for preference
- "Would you like me to book StyleCuts, or would you prefer one of the others?"
- Listens for response:
- If preference stated: Select that option
- If no preference: Select highest-rated (default)
- Confirms selection with TTS
- Generates call script
Input:
- List of services from Service Searcher
- User preference (if stated)
Output:
- Selected service object with all details
Example:
System: "I found these options:
1. StyleCuts (4.8 stars) - nearest
2. Clips & Cuts (4.6 stars)
3. Hair Studio (4.4 stars)
I'll book StyleCuts since it has the best rating.
Does that work for you?"
User: "Yes" or "No, book the second one"
Technology:
- Claude LLM for response interpretation
- Ranking algorithm for default selection
- Natural language understanding for user preference
Error Handling:
- User preference unclear: Ask clarifying question
- User rejects all options: Go back to Service Searcher
Next Agent: Call Scripter Agent (if approved)
Purpose: Generate and preview the phone booking script.
What it does:
- Generates natural conversation script using Claude LLM
- Script includes:
- Professional opening
- Booking details (date, time, party size)
- Polite closing
- TTS speaks script preview
- Listens for user approval
- If approved: Passes script to Call Initiator
- If rejected: Returns to Service Selector for different option
Input:
- Selected service details
- Booking details (time, date, party size)
Output:
- Conversation script
- User approval status
Example:
Generated Script:
"Hi, I'm calling to book an appointment.
I'd like to schedule a haircut for 2 people
on Friday, November 24th at 8 PM.
Is that available?"
System: "I've prepared this script. Should I go ahead and call?"
User: "Yes, make the call"
Technology:
- Claude LLM for natural script generation
- Constrains to concise, professional tone
- Real-time STT for user approval
Features:
- Always asks for approval before calling
- Ensures driver knows what system will say
- Can reject and select different service
Error Handling:
- Script generation fails: Offer to proceed with simple request or try different service
Next Agent: Call Initiator Agent (if approved)
Purpose: Initiate phone call and confirm connection.
What it does:
- Only executes if explicitly approved by user
- Calls Twilio Programmable Voice API
- Dials business phone number
- Announces connection to driver
- TTS says: "Calling [Business Name], connecting now..."
- Passes call to Conversationalist Agent
- Monitors call status
Input:
- Selected service with phone number
- Pre-generated script
Output:
- Call initiated
- Connection confirmation
Example:
System: "Calling StyleCuts at (415) 555-0101..."
System: "Connected to StyleCuts. Go ahead."
Technology:
- Twilio Programmable Voice API
- WebSocket streaming for audio
- Call status monitoring
- Error recovery if call drops
Safety Feature:
- Never calls without explicit user approval
- Always announces what it's doing
- Driver can hear conversation
Error Handling:
- Invalid phone number: Return to Service Selector
- Call fails to connect: Offer to retry or try different service
- Call drops: Attempt reconnection or offer manual fallback
Next Agent: Conversationalist Agent
Purpose: Simulate natural conversation with business and extract booking details.
What it does:
- Receives two-way audio stream from Twilio
- For each business response:
- Transcribes using ElevenLabs STT
- Sends to Claude LLM for response generation
- Synthesizes response with ElevenLabs TTS
- Sends audio back to business
- Conversation continues until:
- Business confirms availability
- Business provides confirmation number
- Booking is confirmed
- Extracts confirmation details from conversation
- Passes to Booking Finalizer
Input:
- Active Twilio call
- Pre-generated opening script
- Business response audio
Output:
- Confirmation number
- Booking details confirmed
- Full conversation transcript
Example Flow:
System: "Hi, I'm calling to book a haircut. I'd like
to schedule 2 people on Friday at 8 PM."
Business (STT): "What time would work for you?"
System (LLM generates): "We'd like 8 PM on Friday."
Business (STT): "Perfect, we have 8 PM available.
Your confirmation number is HC-8247."
System (extraction): Confirmation HC-8247 confirmed
Technology:
- Twilio WebSocket for bidirectional audio
- ElevenLabs STT for transcription
- Claude LLM for response generation
- ElevenLabs TTS for synthesis
- NLP for confirmation extraction
Features:
- Truly conversational (not just a script)
- Handles unexpected responses
- Gracefully handles "sorry, we're full" scenarios
- Extracts essential booking info
Error Handling:
- Booking not available: Ask about alternative times
- Connection drops: Return to user with explanation and business contact info
- Cannot extract confirmation: Ask user to confirm details manually
Next Agent: Booking Finalizer Agent
Purpose: Confirm booking, save to calendar, and end session gracefully.
What it does:
- Generates booking summary using Claude LLM
- TTS speaks full booking confirmation:
- Service provider name
- Date and time
- Confirmation number
- What to bring/expect
- Saves to Google Calendar:
- Creates calendar event
- Includes confirmation number in description
- Sets as "busy" time
- Passes to Goodbye Agent
- Logs completion
Input:
- Confirmation details from Conversationalist
- Full booking state
Output:
- Spoken confirmation
- Calendar event created
- Completion logged
Example:
System: "Your booking is confirmed!
Service: StyleCuts Haircut
Date: Friday, November 24th
Time: 8:00 PM
Party Size: 2 people
Confirmation: HC-8247
I've saved this to your calendar.
See you Friday at 8 PM!"
Calendar Event Created:
- Title: "Haircut at StyleCuts"
- Time: Friday 8 PM - 9 PM
- Location: StyleCuts salon address
- Notes: "Confirmation HC-8247"
Technology:
- Claude LLM for summary generation
- Google Calendar API for event creation
- RFC 3339 timestamp formatting
- Timezone handling
Features:
- Calendar event is immediately available
- Includes all booking details
- Professional, warm tone
- No system delays after confirmation
Error Handling:
- Calendar save fails: Inform user booking is confirmed, calendar will update when service is available
Next Agent: Goodbye Agent
Purpose: End session gracefully and ask if additional help needed.
What it does:
- Generates closing message
- Asks: "Is there anything else I could help?"
- Listens for user response:
- If yes: Restart from Welcome Agent
- If no: End session gracefully
- Speaks final goodbye
- Closes all connections
- Logs session completion
Input:
- Booking completion status
- Session history
Output:
- Session ended
- All resources cleaned up
Example:
System: "Is there anything else I could help?"
User: "No, thank you"
System: "Great! Have a wonderful day. Safe travels!"
[Session ends, resources released]
Technology:
- Claude LLM for natural goodbye
- Session cleanup and resource management
- Logging of session metrics
Features:
- Always polite and professional
- Offers additional help
- Clean shutdown
Error Handling:
- User response unclear: Assume no and end session
Next Agent: None (session ends) or Welcome Agent (if user needs more help)
Purpose: Handle errors and determine if recovery is possible.
What it does:
- Receives error from any agent
- Classifies error type:
- Temporary (network timeout) - retry operation
- Service unavailable (API down) - fail and explain
- User issue (invalid input) - ask for clarification
- Generates user-friendly error message
- TTS speaks explanation
- Determines next action:
- Retry the operation
- Get user input for alternative
- End session with graceful failure message
Input:
- Error object
- Current session state
Output:
- User-friendly error message
- Recovery action or session termination
Example:
Error: Google Places API returned empty results
Handler: "I couldn't find salons at that location.
Would you like me to search a wider area,
or try a different service type?"
User: "Try a wider area"
Action: Retry with broader search parameters
Error Types & Handling:
- Network timeout: Retry up to 2 times
- Service unavailable: End session, explain briefly
- Invalid input: Ask for clarification
- API authentication: End session, inform user of system issue
All agents follow this pattern:
Input (from previous agent or user)
|
v
Process (extract info, call API, generate message)
|
v
Voice Output (TTS speaks result)
|
v
Voice Input (Listen for response if needed)
|
v
Output (pass to next agent)
User speaks "hiya"
|
v
Welcome --> Intent Parser --> Calendar Checker
|
Service Searcher --> Service Selector
|
Call Scripter --> Call Initiator
|
Conversationalist --> Booking Finalizer
|
Goodbye Agent --> Session Ends
If user says "no" or error occurs: May loop back to earlier agent
- Every Message is Spoken: All agents use TTS
- User Confirmation: Critical steps ask for approval
- Graceful Error Handling: Simple, user-friendly error messages
- Service-Agnostic: Works with any business type
- Conversational: Natural language throughout
- Zero Hardcoded Strings: Claude LLM generates all messages
| Agent | Duration | Technology |
|---|
| Welcome | 2-3s | LLM + TTS |
| Intent Parser | 3-5s | STT + LLM |
| Calendar Checker | 1-2s | Google Calendar API |
| Service Searcher | 2-3s | Google Places API |
| Service Selector | 2-3s | LLM + User input |
| Call Scripter | 2-3s | LLM + TTS |
| Call Initiator | 2-5s | Twilio |
| Conversationalist | 30-60s | STT + LLM + TTS |
| Booking Finalizer | 3-5s | LLM + Calendar + TTS |
| Goodbye | 2-3s | LLM + TTS |
| Total | 50-90s | Full booking |
- Intent Parser: Validates input before processing
- Calendar Checker: Uses read-only scope (Google Calendar)
- Call Initiator: Requires explicit user approval
- Booking Finalizer: Uses write scope only for confirmed bookings
- All agents: No PII logged to console
- [Voice Data Flow.md](Voice Data Flow.md) - Complete data pipeline
- [High-Level Architecture.md](High-Level Architecture.md) - System design
- QUICKSTART.md - User guide
- README.md - Project overview
Version: 2.0
Last Updated: November 2024
Status: Production-Ready