Consumer Travel · Seed Round
Multimodal Voice + UI Travel Booking Agent
Users were frustrated by the 'tab fatigue' of booking flights, hotels, and activities across different sites.
Multimodal State Machine Voice-UI Sync Vapi Integration LangGraph
Business Impact
30% increase in booking conversion
The Problem
Booking a vacation is death by a thousand tabs. Flight comparison sites, hotel aggregators, activity platforms, restaurant reservations—users bounce between 10+ tabs, lose track of prices, and abandon bookings out of frustration. The seed-stage travel startup wanted to collapse this chaos into a single, conversational experience.
The Architecture
flowchart TB
subgraph input [User Input]
Voice[Voice via Vapi]
UI[Web Interface]
end
subgraph state [Shared State Machine]
TripState[Trip State]
Preferences[User Preferences]
Selections[Current Selections]
end
subgraph agents [Booking Agents]
FlightAgent[Flight Agent]
HotelAgent[Hotel Agent]
ActivityAgent[Activity Agent]
end
subgraph apis [External APIs]
Amadeus[Amadeus API]
Hotels[Hotel APIs]
Activities[Activity APIs]
Stripe[Stripe Checkout]
end
subgraph output [Synchronized Output]
VoiceResponse[Voice Response]
MapView[Interactive Map]
Itinerary[Live Itinerary]
end
Voice --> TripState
UI --> TripState
TripState --> FlightAgent
TripState --> HotelAgent
TripState --> ActivityAgent
FlightAgent --> Amadeus
HotelAgent --> Hotels
ActivityAgent --> Activities
Amadeus --> Selections
Hotels --> Selections
Activities --> Selections
Selections --> VoiceResponse
Selections --> MapView
Selections --> Itinerary
Itinerary --> Stripe Multimodal State Machine
The key innovation: voice and UI share a single source of truth.
- Shared Trip State: Every interaction—voice or click—updates the same state machine. “Show me cheaper hotels” via voice triggers the same state transition as clicking a price filter.
- Specialized Booking Agents: Flight, hotel, and activity agents each maintain domain expertise and API integrations
- Synchronized Output: State changes propagate instantly to both the voice response and the visual UI—the map zooms, the itinerary updates, the voice confirms
Users can seamlessly switch between talking and tapping without losing context.
Tech Stack
- Vapi — Voice AI with real-time streaming
- LangGraph — State machine orchestration
- Amadeus API — Flight and hotel inventory
- Stripe — Secure checkout flow
The Impact
| Metric | Before | After |
|---|---|---|
| Booking Conversion | 12% | 42% |
| Time to Complete Booking | 45 min | 4 min |
| Cart Abandonment | 70% | 35% |
| User Sessions to Book | 3.5 avg | 1.2 avg |
The voice + UI combination isn’t a gimmick—it’s a genuine UX improvement. Users browse visually and refine conversationally.