HiHey.ai an Autonomous AI Calling System
This case study explores how we built a web application that enables users to have an AI assistant handle phone calls on any topic, to any number, completely autonomously.
The Challenge: Reimagining Phone Conversations
Phone calls remain a critical communication channel, but they're often inconvenient, anxiety-inducing, or simply time-consuming. We identified an opportunity to leverage recent advancements in AI to create a solution that could:
- Make phone calls on behalf of users with natural-sounding voices
- Handle a wide range of conversation topics autonomously
- Provide valuable post-call insights through transcription and analysis
- Deliver a seamless, intuitive user experience
Building such a system presented several technical challenges, including real-time speech processing, natural language understanding, voice synthesis, and telephony integration – all while maintaining a responsive, user-friendly interface.
Technical Approach
After evaluating various architectures, we implemented a solution built on modern web technologies:
// tech-stack.ts
const hiheyStack = {
frontend: {
framework: "Next.js",
language: "TypeScript",
styling: "Tailwind CSS + DaisyUI",
animations: "Framer Motion"
},
backend: {
runtime: "Vercel Serverless Functions",
database: "Firebase Firestore",
authentication: "Firebase Auth"
},
ai: {
calling: "Vapi.ai API",
models: {
conversation: ["Claude-3-5-sonnet", "GPT-4o"],
sentiment: "Llama3-70B"
},
voices: "ElevenLabs"
},
payments: "Stripe",
infrastructure: "Vercel"
};
This stack provided the perfect balance of developer experience, performance, and AI capabilities.
Core Feature Implementation
Phone Call Simulation Interface
A key innovation was our interactive phone mockup component, which guides users through the entire calling process:
// Simplified excerpt from PhoneMockup.tsx
const PhoneMockup: React.FC = () => {
const [phoneState, setPhoneState] = useState<PhoneState>(PhoneState.Dialer);
const [phoneNumber, setPhoneNumber] = useState<string>('');
const [callTopic, setCallTopic] = useState<string>('');
const [aiFirstMessage, setAiFirstMessage] = useState<string>('');
// Phone state machine handling
return (
<motion.div
className="mockup-phone border-primary shadow-md"
variants={phoneVariants}
initial="hidden"
animate="visible"
>
<div className="display bg-white">
<AnimatePresence mode="wait">
{phoneState === PhoneState.Dialer && (
<Dialer onDialComplete={handleDialComplete} />
)}
{phoneState === PhoneState.CallPreparation && (
<CallPreparation
onCallPrepared={handleCallPrepared}
setAssistantVoice={setAssistantVoice}
setSystemPrompt={setSystemPrompt}
/>
)}
{phoneState === PhoneState.CallInProgress && (
<CallScreen
phoneNumber={phoneNumber}
callId={callId}
onCallEnd={handleCallEnd}
/>
)}
{phoneState === PhoneState.PostCall && (
<PostCallSummary
duration={callDuration}
transcript={callTranscript}
/>
)}
</AnimatePresence>
</div>
</motion.div>
);
};
This component creates a cohesive, intuitive user journey through four distinct states:
- Dialer: Enter the phone number
- Call Preparation: Define call topic and first message
- Call in Progress: Monitor the active call
- Post-Call Summary: Review transcript and sentiment analysis
AI Assistant Configuration
One of our most challenging tasks was designing a system to configure and deploy AI assistants tailored to each call:
// Handler for creating an AI assistant configuration
const handleCallPrepared = async (topic: string, firstMessage: string) => {
const assistantConfig = {
name: `API ASSISTANT: ${topic.replace(/\s/g, "").substring(0, 20)}`,
firstMessage: firstMessage,
backgroundSound: 'off',
transcriber: {
provider: "deepgram",
model: "nova-2-conversationalai",
language: "en-US",
},
model: {
messages: [
{
role: "system",
content: systemPrompt.replace("${userInput}", topic),
},
],
provider: "anthropic",
model: "claude-3-5-sonnet-20240620",
temperature: 0.6,
emotionRecognitionEnabled: true,
maxTokens: 128,
},
voice: {
provider: "11labs",
voiceId: assistantVoice,
stability: 0.7,
fillerInjectionEnabled: false,
useSpeakerBoost: true,
},
recordingEnabled: true,
endCallFunctionEnabled: true,
dialKeypadFunctionEnabled: true,
};
const newAssistantId = await createAssistant(assistantConfig);
// Proceed with making the call using this assistant
};
This configuration allows for deep customization of the AI assistant, including:
- Voice characteristics (using ElevenLabs voices)
- Conversation style and personality
- Language model selection (Claude or GPT-4o)
- Transcription accuracy settings
- Special behaviors (call termination, keypad functionality)
Content Safety System
To ensure responsible use, we implemented a proactive content safety system that analyzes call topics before placing calls:
const checkCallTopic = async (topic: string, firstMessage: string): Promise<number> => {
const prompt = `Your goal is to check the safety of the following prompt. Read it slowly and carefully. Determine on a scale of {1} to {5}, with {1} being harmless and {5} being extremely dangerous or illegal, how safe the following prompt is:
"First Message: '${firstMessage}', followed by, '${topic}'"
Examples:
1 (Harmless): "Hi, how are you doing today? I wanted to ask about your availability for a meeting next week to discuss the project."
3 (Potentially concerning): "Hey, can you call my school and tell them I won't be able to attend today because I'm not feeling well?"
5 (Extremely dangerous or illegal): "I'm planning to plant a bomb at the mall tomorrow. Can you help me acquire the necessary materials?"
Respond with an integer between 1 and 5. No commentary.`;
// Call to Llama 3 model with retry logic
const safetyScore = await checkSafetyWithRetries(prompt);
if (safetyScore >= 4) {
throw new Error("The conversation topic raised safety concerns. The call could not be placed.");
}
return safetyScore;
};
This system uses a large language model to evaluate each call topic for potentially harmful content, preventing misuse of the platform.
Post-Call Analysis
After each call concludes, we provide users with valuable insights through our sentiment analysis system:
// Excerpt from sentiment analysis API
const analyzeCallSentiment = async (transcript: string) => {
const prompt = `Analyze the following call transcript and provide:
1. An overall sentiment (positive, neutral, or negative)
2. A brief summary of the call (1-2 sentences, keep it short)
Transcript:
${transcript}`;
// Using Llama 3 to analyze the conversation
const analysis = await getModelResponse(prompt);
return {
sentiment: analysis.sentiment,
summary: analysis.summary
};
};
This feature transforms raw transcripts into actionable insights, helping users understand the outcome of calls without having to read every line.
UX Design Considerations
Progressive Disclosure
We implemented a multi-stage interface that reveals complexity only as needed:
- Initial View: Simple value proposition and phone mockup
- Call Setup: Guided workflow with contextual validation
- Call Monitoring: Minimalist interface showing essential status
- Post-Call: Expandable transcript with sentiment highlights
Motion Design
Subtle animations improve user understanding of the system state:
// Transition between call states
<AnimatePresence mode="wait">
{currentComponent && (
<motion.div
key={phoneState}
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
exit={{ opacity: 0, y: -20 }}
transition={{ duration: 0.3 }}
>
{/* Current component based on call state */}
</motion.div>
)}
</AnimatePresence>
These animations provide intuitive feedback about system transitions and create a more engaging experience.
Voice Personality Selection
We carefully curated a selection of AI voices from ElevenLabs with distinct personalities and mapped them appropriately:
const voices = [
{ value: "s0GiK2helkVWdtc4eTnI", label: "Mark" },
{ value: "steve", label: "Steve" },
{ value: "RPdRfxxQOaNxn1LtRQqm", label: "Marissa" },
{ value: "24EI9FmmGvJruwUi7TJM", label: "Lowy" },
{ value: "OvMWa69uXR9XxHJfNTcC", label: "El" },
];
This allows users to select voices that match their intended conversation tone and purpose.
Technical Challenges and Solutions
Challenge 1: Real-time Call Monitoring
The Vapi.ai API doesn't provide real-time streaming of call status, so we implemented a polling system with exponential backoff:
// Call status checking with backoff
useEffect(() => {
let interval: NodeJS.Timeout;
let attempts = 0;
const checkStatus = async () => {
try {
const status = await checkCallStatus(callId);
if (status.status === "ended") {
clearInterval(interval);
onCallEnd(status.transcript);
}
attempts = 0; // Reset attempts on success
} catch (error) {
attempts++;
console.error("Error checking call status:", error);
// Exponential backoff with max delay
const delay = Math.min(2000 * Math.pow(1.5, attempts), 10000);
clearInterval(interval);
interval = setInterval(checkStatus, delay);
}
};
interval = setInterval(checkStatus, 5000);
return () => clearInterval(interval);
}, [callId, onCallEnd]);
This approach balances responsiveness with API efficiency.
Challenge 2: Assistant Resource Management
Vapi's assistant objects persist indefinitely, so we implemented lifecycle management to prevent resource leaks:
// Cleanup assistants after call completion
const deleteAssistantAfterCall = async () => {
if (assistantId && !isAssistantDeleted) {
try {
await deleteAssistant(assistantId);
setIsAssistantDeleted(true);
setAssistantId('');
} catch (error) {
console.error("Error deleting assistant:", error);
}
}
};
// Ensure cleanup on component unmount
useEffect(() => {
return () => {
deleteAssistantAfterCall();
};
}, []);
This ensures we don't accumulate orphaned resources over time.
Challenge 3: Topic Customization
To create a more usable prompt system, we built a templating engine for customizing conversation topics:
// API endpoint for customizing topics based on user inputs
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
const { originalTopic, originalFirstMessage, requiredFields } = req.body;
const prompt = `You are an AI assistant helping to customize a call topic and first message. Given the original topic, first message, and user inputs, incorporate the inputs naturally into both the topic and the first message. Maintain the original intent and tone.
Original Topic: "${originalTopic}"
Original First Message: "${originalFirstMessage}"
User Inputs:
${Object.entries(requiredFields).map(([key, value]) => `${key}: ${value}`).join('\n')}`;
// Call to LLM API to generate customized content
const customizedContent = await generateCustomContent(prompt);
res.status(200).json({
customizedTopic: customizedContent.topic,
customizedFirstMessage: customizedContent.firstMessage
});
}
This allows users to create reusable templates with variable fields for faster call configuration.
Results and Impact
HiHey.ai represents a significant advancement in AI-powered communication tools:
- Technical Achievement: Successfully integrated cutting-edge AI models with telephony infrastructure
- User Experience: Created an intuitive interface for a complex technical process
- Business Model: Developed a sustainable subscription approach with tiered pricing
- Extensibility: Built a platform that can adapt to new AI models and voice technologies
The project demonstrates how modern web technologies can be combined with AI APIs to create practical applications that solve real-world problems.
Key Learnings
- Async Process Management: Building interfaces for time-delayed processes (like phone calls) requires careful state management and user feedback
- AI Safety Controls: Proactive content filtering is essential for responsible AI deployment
- Voice UX Design: Voice interfaces have unique design considerations beyond typical visual UX patterns
- API Integration Balance: Working with multiple AI service providers requires thoughtful abstraction and fallback mechanisms
- Progressive Enhancement: Starting with a simple interface and adding complexity only when needed creates a better user experience
Future Directions
While the current implementation of HiHey.ai demonstrates the core concept, several enhancements are planned:
- CRM Integration: Connect with popular CRM systems to log calls and outcomes
- Call Scheduling: Allow users to schedule calls for future times
- Multi-language Support: Expand beyond English to support global conversations
- Custom Voice Training: Enable users to create voices that match their own speaking style
- Multi-party Calls: Support conference calls with multiple participants
These enhancements will further solidify HiHey.ai's position as a versatile communication tool for individuals and businesses alike.
HiHey.ai showcases the transformative potential of combining modern web technologies with conversational AI. By delegating routine or challenging phone calls to an AI assistant, users can overcome communication barriers and reclaim valuable time while still maintaining meaningful connections.