Table of Contents
Table of contents
Our Use-Case
Tools We’ll Use
Requirements
Project Structure
Setting Up the Virtual Environment
Main Python Scripts
1. Main Imports
2. Load your API Keys and initialize your models
3. Converting the Audio file (our voice recording) into .wav format
4. Splitting Audio
5. LLM Response Generation
6. Text to Speech
7. Create Introductory Message
Streamlit App
Import Libraries and Functions
Streamlit Setup
Invoking our utils functions
Sidebar Setup
Final Output
What Improvements can be made?
Conclusion
Home Technology peripherals AI Emergency Operator Voice Chatbot: Empowering Assistance

Emergency Operator Voice Chatbot: Empowering Assistance

May 07, 2025 am 09:48 AM

Language models have been rapidly evolving in the world. Now, with Multimodal LLMs taking up the forefront of this Language Models race, it is important to understand how we can leverage the capabilities of these Multimodal models. From traditional text-based AI-powered chatbots, we are transitioning over to voice based chatbots. These act as our personal assistants, available at a moment’s notice to tend to our needs. Nowadays, you can find an AI-In this blog, we’ll build an Emergency Operator voice-based chatbot. The idea is pretty straightforward:

  • We speak to the chatbot
  • It listens to understands what we’ve said
  • It responds with a voice note

Emergency Operator Voice Chatbot: Empowering Assistance

Table of contents

  • Our Use-Case
  • Tools We’ll Use
  • Requirements
  • Project Structure
  • Setting Up the Virtual Environment
  • Main Python Scripts
  • Streamlit App
  • Final Output
  • What Improvements can be made?
  • Conclusion

Our Use-Case

Let’s imagine a real-world scenario. We live in a country with over 1.4 billion people and with such a huge population, emergencies are bound to occur whether it’s a medical issue, a fire breakout, police intervention, or even mental health support like anti-suicide assistance etc.

In such moments, every second counts. Also, considering the lack of Emergency Operators and the overwhelming amount of issues raised. That’s where a voice-based chatbot can make a big difference which can offer quick, spoken assistance when people need it the most.

  • Emergency Assistance: Immediate help for health, fire, crime, or disaster-related queries without waiting for a human operator (when not available).
  • Mental Health Helpline: A voice-based emotional support assistant guiding users with compassion.
  • Rural Accessibility: Areas with limited access to mobile apps can benefit from a simple voice interface since people often communicate by speaking in such areas.

That’s exactly what we’re going to build. We will be acting as someone seeking help, and the chatbot will play the role of an emergency responder, powered by a large language model.

Tools We’ll Use

To implement our voice chatbot, we will be using the below mentioned AI models:

  • Whisper (Large) – OpenAI’s speech-to-text model, running via GroqCloud, to convert voice into text.
  • GPT-4.1-mini – Powered by CometAPI (Free LLM Provider), this is the brain of our chatbot that will understand our queries and will generate meaningful responses.
  • Google Text-to-Speech (gTTS) – Converts the chatbot’s responses back into voice so it can talk to us.
  • FFmpeg – A handy library that helps us record and manage audio easily.

Requirements

Before we start coding, we need to set up some things:

  1. GroqCloud API Key: Get it from here: https://console.groq.com/keys
  2. CometAPI Key
    Register and store your API key from: https://api.cometapi.com/
  3. ElevenLabs API Key
    Register and store your API key from: https://elevenlabs.io/app/home
  4. FFmpeg Installation
    If you don’t already have it, follow this guide to install FFmpeg on your system: https://itsfoss.com/ffmpeg/

Confirm by typing “ffmeg -version” in your terminal

Once you have these set up, you’re ready to dive into building your very own voice-enabled chatbot!

Project Structure

The Project Structure will be rather simple and rudimentary and most of our working will be happening in the app.py and utils.py python scripts.

VOICE-CHATBOT/<br><br>├── venv/         # Virtual environment for dependencies<br>├── .env          # Environment variables (API keys, etc.)<br>├── app.py         # Main application script<br>├── emergency.png     # Emergency-related image asset<br>├── README.md       # Project documentation (optional)<br>├── requirements.txt    # Python dependencies<br>├── utils.py        # Utility/helper functions
Copy after login

There are some necessary files to be modified to ensure that all our dependencies are satisfied:

In the .env file

GROQ_API_KEY = "<your-groq-api-key comet_api_key="<your-comet-api-key>" elevenlabs_api_key="<your-elevenlabs-api–key">



<p>In the requirements.txt</p>



<pre class="brush:php;toolbar:false">ffmpeg-python

pydub

pyttsx3

langchain

langchain-community

langchain-core

langchain-groq

langchain_openai

python-dotenv

streamlit==1.37.0

audio-recorder-streamlit

dotenv

elevenlabs

gtts
Copy after login

Setting Up the Virtual Environment

We will also have to set up a virtual environment (a good practice). We will be doing this in terminal.

  1. Creation of our virtual environment
~/Desktop/Emergency-Voice-Chatbot$ conda create -p venv python==3.12 -y
Copy after login

Emergency Operator Voice Chatbot: Empowering Assistance

  1. Activating our Virtual Environment
~/Desktop/Emergency-Voice-Chatbot$ conda activate venv/
Copy after login

Emergency Operator Voice Chatbot: Empowering Assistance

  1. After you finish running the application, you can deactivate the Virtual Environment too
~/Desktop/Emergency-Voice-Chatbot$ conda deactivate
Copy after login

Emergency Operator Voice Chatbot: Empowering Assistance

Main Python Scripts

Let’s first explore the utils.py script.

1. Main Imports

time, tempfile, os, re, BytesIO – Handle timing, temporary files, environment variables, regex, and in-memory data.

requests – Makes HTTP requests (e.g., calling APIs).

gTTS, elevenlabs, pydub – Convert text to speech, speech to text and play/manipulate audio.

groq, langchain_* – Use Groq/OpenAI LLMs with LangChain to process and generate text.

streamlit – Build interactive web apps.dotenv – Load environment variables (like API keys) from a .env file.

import time

import requests

import tempfile

import re

from io import BytesIO

from gtts import gTTS

from elevenlabs.client import ElevenLabs

from elevenlabs import play

from pydub import AudioSegment

from groq import Groq

from langchain_groq import ChatGroq

from langchain_openai import ChatOpenAI

from langchain_core.messages import AIMessage, HumanMessage

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate

import streamlit as st

import os

from dotenv import load_dotenv

load_dotenv()
Copy after login

2. Load your API Keys and initialize your models

# Initialize the Groq client

client = Groq(api_key=os.getenv('GROQ_API_KEY'))

# Initialize the Groq model for LLM responses

llm = ChatOpenAI(

    model_name="gpt-4.1-mini",

    openai_api_key=os.getenv("COMET_API_KEY"), 

    openai_api_base="https://api.cometapi.com/v1"

)

# Set the path to ffmpeg executable

AudioSegment.converter = "/bin/ffmpeg"
Copy after login

3. Converting the Audio file (our voice recording) into .wav format

Here, we will converting our audio in bytes which is done by AudioSegment and BytesIO and convert it into a wav format:

def audio_bytes_to_wav(audio_bytes):
   try:
       with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_wav:
           audio = AudioSegment.from_file(BytesIO(audio_bytes))
           # Downsample to reduce file size if needed
           audio = audio.set_frame_rate(16000).set_channels(1)
           audio.export(temp_wav.name, format="wav")
           return temp_wav.name
   except Exception as e:
       st.error(f"Error during WAV file conversion: {e}")
       return None
Copy after login

4. Splitting Audio

We will make a function to split our audio as per our input parameter (check_length_ms). We will also make a function to get rid of any punctuation with the help of regex.

def split_audio(file_path, chunk_length_ms):
   audio = AudioSegment.from_wav(file_path)
   return [audio[i:i   chunk_length_ms] for i in range(0, len(audio), chunk_length_ms)]


def remove_punctuation(text):
   return re.sub(r'[^\w\s]', '', text)
Copy after login

5. LLM Response Generation

Now, to do main responder functionality where the LLM will generate an apt response to our queries. In the prompt template, we will provide the instructions to our LLM on how they should respond to the queries. We will be implementing Langchain Expression Language to do this task.

def get_llm_response(query, chat_history):
   try:
       template = template = """
                   You are an experienced Emergency Response Phone Operator trained to handle critical situations in India.
                   Your role is to guide users calmly and clearly during emergencies involving:


                   - Medical crises (injuries, heart attacks, etc.)
                   - Fire incidents
                   - Police/law enforcement assistance
                   - Suicide prevention or mental health crises


                   You must:


                   1. **Remain calm and assertive**, as if speaking on a phone call.
                   2. **Ask for and confirm key details** like location, condition of the person, number of people involved, etc.
                   3. **Provide immediate and practical steps** the user can take before help arrives.
                   4. **Share accurate, India-based emergency helpline numbers** (e.g., 112, 102, 108, 1091, 1098, 9152987821, etc.).
                   5. **Prioritize user safety**, and clearly instruct them what *not* to do as well.
                   6. If the situation involves **suicidal thoughts or mental distress**, respond with compassion and direct them to appropriate mental health helplines and safety actions.


                   If the user's query is not related to an emergency, respond with:
                   "I can only assist with urgent emergency-related issues. Please contact a general support line for non-emergency questions."


                   Use an authoritative, supportive tone, short and direct sentences, and tailor your guidance to **urban and rural Indian contexts**.


                   **Chat History:** {chat_history}


                   **User:** {user_query}
                   """


       prompt = ChatPromptTemplate.from_template(template)
       chain = prompt | llm | StrOutputParser()


       response_gen = chain.stream({
           "chat_history": chat_history,
           "user_query": query
       })


       response_text = ''.join(list(response_gen))
       response_text = remove_punctuation(response_text)


       # Remove repeated text
       response_lines = response_text.split('\n')
       unique_lines = list(dict.fromkeys(response_lines))  # Removing duplicates
       cleaned_response = '\n'.join(unique_lines)
       return cleaned_responseChatbot
   except Exception as e:
       st.error(f"Error during LLM response generation: {e}")
       return "Error"
Copy after login

6. Text to Speech

We will build a function to convert our text to speech with the help of ElevenLabs TTS Client which will return us the Audio in the AudioSegment format. We can also use other TTS models like Nari Lab’s Dia or Google’s gTTS too. Eleven Labs provides us some free credits at start and then we have to pay for more credits, gTTS on the other side is absolutely free to use.

def text_to_speech(text: str, retries: int = 3, delay: int = 5):
   attempt = 0
   while attempt 



<h3 id="Create-Introductory-Message">7. Create Introductory Message</h3>



<p>We will also create an introductory text and pass it to our TTS model since a respondent would normally introduce themselves and seek what assistance the user might need. Here we will be returning the path of the mp3 file.</p>



<p><em>lang=” en”</em> -> English</p>



<p><em>tld= ”co.in”</em> -> can produce different localized ‘accents’ for a given language. The default is “com”</p>



<pre class="brush:php;toolbar:false">def create_welcome_message():
   welcome_text = (
       "Hello, you’ve reached the Emergency Help Desk. "
       "Please let me know if it's a medical, fire, police, or mental health emergency—"
       "I'm here to guide you right away."
   )
   try:
       # Request speech synthesis (streaming generator)
       response_stream = tts_client.text_to_speech.convert(
           text=welcome_text,
           voice_,
           model_,
           output_format="mp3_44100_128",
       )
       # Save streamed bytes to temp file
       with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as f:
           for chunk in response_stream:
               f.write(chunk)
           return f.name
   except requests.ConnectionError:
       st.error("Failed to generate welcome message due to connection error.")
   except Exception as e:
       st.error(f"Error creating welcome message: {e}")
   return None
Copy after login

Streamlit App

Now, let’s jump into the main.py script where we will be using Streamlit to visualize our Chatbot.

Import Libraries and Functions

Import our libraries and the functions we had built in our utils.py

import tempfile

import re # This can be removed if not used

from io import BytesIO

from pydub import AudioSegment

from langchain_core.messages import AIMessage, HumanMessage

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate

import streamlit as st

from audio_recorder_streamlit import audio_recorder

from utils import *
Copy after login

Streamlit Setup

Now, we will set our Title name and nice “Emergency” visual photo

st.title(":blue[Emergency Help Bot] ???")
st.sidebar.image('./emergency.jpg', use_column_width=True)
Copy after login

We will set our Session States to keep track of our chats and audio

if "chat_history" not in st.session_state:
   st.session_state.chat_history = []
if "chat_histories" not in st.session_state:
   st.session_state.chat_histories = []
if "played_audios" not in st.session_state:
   st.session_state.played_audios = {}
Copy after login

Invoking our utils functions

We will create our welcome message introduction from the Respondent side. This will be the start of our conversation.

if len(st.session_state.chat_history) == 0:
   welcome_audio_path = create_welcome_message()
   st.session_state.chat_history = [
       AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)
   ]
   st.session_state.played_audios[welcome_audio_path] = False
Copy after login

Now, in the sidebar we will setting up our voice recorder and the speech-to-text, llm_response and the text-to-speech logic which is the main crux of this project

with st.sidebar:
   audio_bytes = audio_recorder(
       energy_threshold=0.01,
       pause_threshold=0.8,
       text="Speak on clicking the ICON (Max 5 min) \n",
       recording_color="#e9b61d",   # yellow
       neutral_color="#2abf37",    # green
       icon_name="microphone",
       icon_size="2x"
   )
   if audio_bytes:
       temp_audio_path = audio_bytes_to_wav(audio_bytes)
       if temp_audio_path:
           try:
               user_input = speech_to_text(audio_bytes)
               if user_input:
                   st.session_state.chat_history.append(HumanMessage(content=user_input, audio_file=temp_audio_path))
                   response = get_llm_response(user_input, st.session_state.chat_history)
                   audio_response = text_to_speech(response)
Copy after login

We will also setup a button on the sidebar which will allow us to restart our session if needed be and of course our introductory voice note from the respondent side.

if st.button("Start New Chat"):
       st.session_state.chat_histories.append(st.session_state.chat_history)
       welcome_audio_path = create_welcome_message()
       st.session_state.chat_history = [
           AIMessage(content="Hello, you’ve reached the Emergency Help Desk. Please let me know if it's a medical, fire, police, or mental health emergency—I'm here to guide you right away.", audio_file=welcome_audio_path)
       ]
Copy after login

And in the main page of our App, we will be visualizing our Chat History in the form of Click to Play Audio file

for msg in st.session_state.chat_history:
   if isinstance(msg, AIMessage):
       with st.chat_message("AI"):
           st.audio(msg.audio_file, format="audio/mp3")
   else:  # HumanMessage
       with st.chat_message("user"):
           st.audio(msg.audio_file, format="audio/wav")
Copy after login

Now, we are done with all of the Python scripts needed to run our app. We will run the Streamlit App using the following Command:

streamlit run app.py
Copy after login

So, this is what our Project Workflow looks like:

[User speaks] → audio_recorder → audio_bytes_to_wav → speech_to_text → get_llm_response → text_to_speech → st.audio
Copy after login

For the full code, visit this GitHub repository.

Final Output

Emergency Operator Voice Chatbot: Empowering Assistance

The Streamlit App looks pretty clean and is functioning appropriately!

Let’s see some of its responses:-

  1. User: Hi, someone is having a heart attack right now, what should I do?

We then had a conversation on the location and state of the person and then the Chatbot provided this

  1. User: Hello, there has been a huge fire breakout in Delhi. Please send help quick

Respondent enquires about the situation and where is my current location and then proceeds to provide preventive measures accordingly

  1. User: Hey there, there is a person standing alone across the edge of the bridge, how should i proceed?

The Respondent enquires about the location where I am and the mental state of the person I’ve mentioned

Overall, our chatbot is able to respond to our queries in accordance to the situation and asks the relevant questions to provide preventive measures.

Read More: How to build a chatbot in Python?

What Improvements can be made?

  • Multilingual Support: Can integrate LLMs with strong multilingual capabilities which can allow the chatbot to interact seamlessly with users from different regions and dialects.
  • Real-Time Transcription and Translation: Adding speech-to-text and real-time translation can help bridge communication gaps.
  • Location-Based Services: By integrating GPS or other real-time location-based APIs, the system can detect a user’s location and guide the nearest emergency facilities.
  • Speech-to-Speech Interaction: We can also use speech-to-speech models which can make conversations feel more natural since they are built for such functionalities.
  • Fine-tuning the LLM: Custom fine-tuning of the LLM based on emergency-specific data can improve its understanding and provide more accurate responses.

To learn more about AI-powered voice agents, follow these resources:

  • Building Customer Support Voice Agent
  • Paper to voice assistant
  • Multilingual Voice Agent
  • Top 10 Open Source Python Libraries for Building Voice Agents

Conclusion

In this article, we successfully built a voice-based emergency response chatbot using a combination of AI models and some relevant tools. This chatbot replicates the role of a trained emergency operator which is capable of handling high-stress situations from medical crises, and fire incidents to mental health support using a calm, assertive that can alter the behavior of our LLM to suit the diverse real-world emergencies, making the experience more realistic for both urban and rural scenario.

The above is the detailed content of Emergency Operator Voice Chatbot: Empowering Assistance. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1269
29
C# Tutorial
1249
24
Getting Started With Meta Llama 3.2 - Analytics Vidhya Getting Started With Meta Llama 3.2 - Analytics Vidhya Apr 11, 2025 pm 12:04 PM

Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

10 Generative AI Coding Extensions in VS Code You Must Explore 10 Generative AI Coding Extensions in VS Code You Must Explore Apr 13, 2025 am 01:14 AM

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let&#8217

AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and More AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and More Apr 11, 2025 pm 12:01 PM

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

GPT-4o vs OpenAI o1: Is the New OpenAI Model Worth the Hype? GPT-4o vs OpenAI o1: Is the New OpenAI Model Worth the Hype? Apr 13, 2025 am 10:18 AM

Introduction OpenAI has released its new model based on the much-anticipated “strawberry” architecture. This innovative model, known as o1, enhances reasoning capabilities, allowing it to think through problems mor

A Comprehensive Guide to Vision Language Models (VLMs) A Comprehensive Guide to Vision Language Models (VLMs) Apr 12, 2025 am 11:58 AM

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

3 Methods to Run Llama 3.2 - Analytics Vidhya 3 Methods to Run Llama 3.2 - Analytics Vidhya Apr 11, 2025 am 11:56 AM

Meta's Llama 3.2: A Multimodal AI Powerhouse Meta's latest multimodal model, Llama 3.2, represents a significant advancement in AI, boasting enhanced language comprehension, improved accuracy, and superior text generation capabilities. Its ability t

How to Add a Column in SQL? - Analytics Vidhya How to Add a Column in SQL? - Analytics Vidhya Apr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Pixtral-12B: Mistral AI's First Multimodal Model - Analytics Vidhya Pixtral-12B: Mistral AI's First Multimodal Model - Analytics Vidhya Apr 13, 2025 am 11:20 AM

Introduction Mistral has released its very first multimodal model, namely the Pixtral-12B-2409. This model is built upon Mistral’s 12 Billion parameter, Nemo 12B. What sets this model apart? It can now take both images and tex

See all articles