Back to Projects
Consumer Travel · Seed Round

Multimodal Voice + UI Travel Booking Agent

Users were frustrated by the 'tab fatigue' of booking flights, hotels, and activities across different sites.

Multimodal State Machine Voice-UI Sync Vapi Integration LangGraph
Business Impact
30% increase in booking conversion

The Problem

Booking a vacation is death by a thousand tabs. Flight comparison sites, hotel aggregators, activity platforms, restaurant reservations—users bounce between 10+ tabs, lose track of prices, and abandon bookings out of frustration. The seed-stage travel startup wanted to collapse this chaos into a single, conversational experience.

The Architecture

flowchart TB
  subgraph input [User Input]
      Voice[Voice via Vapi]
      UI[Web Interface]
  end
  
  subgraph state [Shared State Machine]
      TripState[Trip State]
      Preferences[User Preferences]
      Selections[Current Selections]
  end
  
  subgraph agents [Booking Agents]
      FlightAgent[Flight Agent]
      HotelAgent[Hotel Agent]
      ActivityAgent[Activity Agent]
  end
  
  subgraph apis [External APIs]
      Amadeus[Amadeus API]
      Hotels[Hotel APIs]
      Activities[Activity APIs]
      Stripe[Stripe Checkout]
  end
  
  subgraph output [Synchronized Output]
      VoiceResponse[Voice Response]
      MapView[Interactive Map]
      Itinerary[Live Itinerary]
  end
  
  Voice --> TripState
  UI --> TripState
  TripState --> FlightAgent
  TripState --> HotelAgent
  TripState --> ActivityAgent
  
  FlightAgent --> Amadeus
  HotelAgent --> Hotels
  ActivityAgent --> Activities
  
  Amadeus --> Selections
  Hotels --> Selections
  Activities --> Selections
  
  Selections --> VoiceResponse
  Selections --> MapView
  Selections --> Itinerary
  Itinerary --> Stripe

Multimodal State Machine

The key innovation: voice and UI share a single source of truth.

  1. Shared Trip State: Every interaction—voice or click—updates the same state machine. “Show me cheaper hotels” via voice triggers the same state transition as clicking a price filter.
  2. Specialized Booking Agents: Flight, hotel, and activity agents each maintain domain expertise and API integrations
  3. Synchronized Output: State changes propagate instantly to both the voice response and the visual UI—the map zooms, the itinerary updates, the voice confirms

Users can seamlessly switch between talking and tapping without losing context.

Tech Stack

  • Vapi — Voice AI with real-time streaming
  • LangGraph — State machine orchestration
  • Amadeus API — Flight and hotel inventory
  • Stripe — Secure checkout flow

The Impact

MetricBeforeAfter
Booking Conversion12%42%
Time to Complete Booking45 min4 min
Cart Abandonment70%35%
User Sessions to Book3.5 avg1.2 avg

The voice + UI combination isn’t a gimmick—it’s a genuine UX improvement. Users browse visually and refine conversationally.