Table of Contents
Browser support
Voice Recognition Library
Voice-controlled audio player
Beginner — Playlist
Audio player
Set data
UI method
load
changeCurrentSongEffect
playSong
changeStatusCode
changeLastCommand
toggleSpinner
Player method
Play
pauseSong
stop
prev
next
searchSpecificSong
Voice API Method
speak
processCommands
Tele everything together
Processing results
Conclusion
Frequently Asked Questions about Voice Control Audio Players Using Web Voice API (FAQ)
How does the Web Voice API work in a voice-controlled audio player?
What are the advantages of using voice-controlled audio players?
Can I use the Voice API in any web browser?
How to improve the accuracy of speech recognition in voice-controlled audio players?
Can I customize voice commands in voice-controlled audio player?
Does the Web Voice API support languages ​​other than English?
How is the security of the Web Voice API?
Can I use the Web Voice API in my mobile application?
What are the limitations of the Web Voice API?
How to get started with the Voice Web API?
Home Web Front-end JS Tutorial Make a Voice-Controlled Audio Player with the Web Speech API

Make a Voice-Controlled Audio Player with the Web Speech API

Feb 18, 2025 am 09:40 AM

Make a Voice-Controlled Audio Player with the Web Speech API

Core points

  • Web Voice API is a JavaScript API that allows web developers to integrate speech recognition and synthesis into their web pages, thereby enhancing the user experience, especially for people with disabilities or users who need to handle multiple tasks simultaneously.
  • Voice Recognition API Currently requires an internet connection and user permissions to access the microphone. Library such as Annyang can help manage complexity and ensure forward compatibility.
  • You can use the speech synthesis API and the speech recognition API to build voice-controlled audio players. This allows the user to navigate between songs and request specific songs using voice commands.
  • The audio player will contain settings data, UI methods, voice API methods, and audio operation methods. Codes that identify and process user input are only applicable to WebKit browsers.
  • Web Voice API has the potential to be used in many areas, such as using voice commands to browse emails, navigate websites, or search for the web. With the implementation stable and new features added, the use of this API is expected to grow.

/ Used to hide/show extra blocks / .sp_hiddenblock { margin: 2px; border: 1px solid rgb(250, 197, 82); border-radius: 3px; padding: 5px; background-color: rgba(250, 197, 82, 0.7); } .sp_hiddenblock.sp_hide { display: none !important; } This article was reviewed by Edwin Reynoso and Mark Brown. Thanks to all SitePoint peer reviewers for getting SitePoint content to its best!

Web Voice API is a JavaScript API that enables web developers to integrate speech recognition and synthesis capabilities into their web pages.

There are many reasons for this. For example, to enhance the experience of people with disabilities (especially users with visual impairment, or users with limited hand mobility), or to allow users to interact with web applications while performing other tasks, such as driving.

If you have never heard of the Web Voice API, or you want to get started quickly, it might be a good idea to read Aurelio De Rosa's articles Introduction to the Web Voice API, Voice Synthesis API, and Talking Forms idea.

Browser support

Browser manufacturers have only recently begun to implement both the voice recognition API and the voice synthesis API. As you can see, support for these APIs is far from perfect, so if you are studying this tutorial, use the right browser.

In addition, the voice recognition API currently requires an internet connection because voice will be transmitted over the network and the result will be returned to the browser. If the connection uses HTTP, the user must allow the site to use their microphone every time the request is made. If the connection uses HTTPS, you only need to do this once.

Voice Recognition Library

The

library helps us manage complexity and ensures we stay forward compatible. For example, when another browser starts supporting the voice recognition API, we don't have to worry about adding vendor prefixes.

Annyang is such a library, which is very easy to use. Learn more.

To initialize Annyang, we add their scripts to our website:

<🎜>
Copy after login
Copy after login
Copy after login

We can check if the API is supported like this:

if (annyang) { /*逻辑*/ }
Copy after login
Copy after login
Copy after login

and add a command using an object that uses the command name as the key and the callback as the method:

var commands = {
  'show divs': function() {
    $('div').show();
  },
  'show forms': function() {
    $("form").show();
  }
};
Copy after login
Copy after login

Finally, we just add them and start voice recognition with the following command:

annyang.addCommands(commands);
annyang.start();
Copy after login
Copy after login

Voice-controlled audio player

In this article, we will build a voice-controlled audio player. We will use both the Speech Synthesis API (used to tell the user which song is being played, or the command is not recognized) and the Speech Recognition API (converting voice commands to strings that will trigger different application logic).

The advantage of using the audio player with the Web Voice API is that users can browse other pages in the browser or minimize the browser and perform other actions while still being able to switch between songs. If we have many songs on our playlist, we can even request a specific song without manual search (if we know its name or singer, of course).

We will not rely on third-party libraries for speech recognition, as we want to show how to use the API without adding additional dependencies to the project. Voice-controlled audio players only support browsers that support the interimResults attribute. The latest version of Chrome should be a safe choice.

As always, you can find the full code on GitHub, as well as a demo on CodePen.

Beginner — Playlist

Let's start with a static playlist. It consists of an object that contains different songs in an array. Each song is a new object containing the path to the file, the singer's name, and the name of the song:

var data = {
  "songs": [
    {
      "fileName": "https://www.ruse-problem.org/songs/RunningWaters.mp3",
      "singer" : "Jason Shaw",
      "songName" : "Running Waters"
    },
    ...
Copy after login

We should be able to add new objects to the songs array and automatically include new songs into our audio player.

Audio player

Now let's look at the player itself. This will be an object containing the following:

  • Some setting data
  • Methods related to UI (such as filling song lists)
  • Methods related to Voice API (such as recognition and processing commands)
  • Methods related to audio operation (e.g. play, pause, stop, previous, next)

Set data

This is relatively simple.

var audioPlayer = {
  audioData: {
    currentSong: -1,
    songs: []
  },
Copy after login
The

currentSong attribute refers to the index of the song the user is currently in. This is useful, for example, when we have to play the previous/next song or stop/pause song.

songs Array contains all songs the user has listened to. This means that the next time the user listens to the same song, we can load it from the array without downloading it.

You can view the full code here.

UI method

UI will contain a list of available commands, a list of available tracks, and a context box to notify the user of the current action and previous commands. I won't go into detail about the UI method, but provide a brief overview. You can find the code for these methods here.

load

This will iterate over the playlist we declared earlier and append the song's name, along with the artist's name, to the list of available tracks.

changeCurrentSongEffect

This indicates which song is currently playing (by marking it in green and adding a pair of headphones next to it), and which songs have been played.

playSong

This indicates that the user's song is playing or ended through the changeStatusCode method (adding this information to the box) and by notifying the user of this change through the voice API.

changeStatusCode

As mentioned above, this updates the status message in the context box (for example, indicating that a new song is being played) and uses the speak method to notify the user of this change.

changeLastCommand

A small helper function to update the last command box.

toggleSpinner

A small helper function to hide or display the spinner icon (indicating that the user's voice command is currently being processed).

Player method

The player will be responsible for what you might expect, namely: starting, stopping, and pausing playback, and moving back and forth between tracks. Again, I'm not going to go into these methods in detail, but rather I want to direct you to our GitHub code base.

Play

This checks whether the user has listened to the song. If not, it starts the song, otherwise it will only call the playSong method we discussed earlier on the currently cached song. This is in audioData.songs and corresponds to the currentSong index.

pauseSong

This pauses or stops completely (returns playback time to the beginning of the song) a song, depending on what is passed as the second parameter. It also updates the status code to notify the user that the song has been stopped or paused.

stop

This pauses or stops the song based on its first and only parameter:

prev

This checks if the previous song is cached, and if so, pauses the current song, decrements currentSong and plays the current song again. If the new song is not in the array, it does the same thing, but it first loads the song based on the file name/path corresponding to the decreasing currentSong index.

next

If the user has listened to a song before, this method will try to pause it. If the next song exists in our data object (i.e. our playlist), it loads and plays it. If there is no next song, it will just change the status code and inform the user that they have reached the last song.

searchSpecificSong

This takes the keyword as a parameter and performs a linear search between the song name and the artist, then plays the first match.

Voice API Method

The Voice API is surprisingly easy to implement. In fact, just two lines of code can make the web application talk to the user:

<🎜>
Copy after login
Copy after login
Copy after login

What we do here is create a utterance object with the text we want to say. The speechSynthesis interface (available on the window object) is responsible for handling this utterance object and controlling the playback of the generated voice.

Continue to try it in your browser. It's that simple!

speak

We can see its practical application in our speak method, which reads aloud the message passed as a parameter:

if (annyang) { /*逻辑*/ }
Copy after login
Copy after login
Copy after login

If a second parameter (scope) exists, after the message is played, we call the scope method on play (which will be an Audio object).

processCommands

This method is not that exciting. It takes a command as an argument and calls the appropriate method to respond to it. It uses a regular expression to check if the user wants to play a specific song, otherwise, it goes into a switch statement to test different commands. If none corresponds to the received command, it informs the user that the command is not understood.

You can find its code here.

Tele everything together

So far, we have a data object representing the playlist, and a audioPlayer object representing the player itself. Now we need to write some code to identify and process user input. Please note that this applies only to WebKit browsers.

The code that makes the user talk to your app as simple as before:

var commands = {
  'show divs': function() {
    $('div').show();
  },
  'show forms': function() {
    $("form").show();
  }
};
Copy after login
Copy after login

This will invite users to allow pages to access their microphone. If you allow access, you can start talking and when you stop, the onresult event will be triggered to make the result of the voice capture available as a JavaScript object.

Reference: HTML5 Speech Recognition API

We can implement it in our application as follows:

annyang.addCommands(commands);
annyang.start();
Copy after login
Copy after login

As you can see, we tested the presence of webkitSpeechRecognition on the window object. If it exists, then we can start, otherwise we will tell the user that the browser does not support it. If all goes well, we then set a few options. Among them lang is an interesting option that improves the recognition results based on your origin.

Then, we declare handlers for the start and onresult events before starting the onend method.

Processing results

When the speech recognizer gets results, at least in the context of the current speech recognition implementation and our needs, we want to do a few things. Every time there is a result, we want to save it in the array and set a timeout to wait for three seconds so that the browser can collect any further results. After three seconds we want to use the collected results and loop through them in reverse order (newer results are more likely to be accurate) and check if the identified transcripts contain one of the commands we have available. If so, we execute the command and restart voice recognition. We do this because it can take up to a minute to wait for the end result, which makes our audio player look rather unresponsive and meaningless because it will be faster with just a click of a button.

<🎜>
Copy after login
Copy after login
Copy after login

Because we don't use the library, we have to write more code to set up our speech recognizer, loop through each result and check if its transcription matches the given keyword.

Finally, we restart it immediately at the end of speech recognition:

if (annyang) { /*逻辑*/ }
Copy after login
Copy after login
Copy after login

You can view the full code for this section here.

That's it. We now have a fully functional and voice-controlled audio player. I highly recommend you download the code from GitHub and try it out, or check out the CodePen demo. I also provide a version that serves over HTTPS.

Conclusion

I hope this practical tutorial will provide a good introduction to the possibilities of the Web Voice API. I think as the implementation stabilizes and new features are added, we will see the usage of this API grow. For example, I think future YouTube will be completely voice-controlled, where we can watch videos from different users, play specific songs, and move between songs with just voice commands.

The Web Voice API can also improve many other areas or open up new possibilities. For example, use voice to browse emails, navigate websites, or search for the network.

Do you use this API in your project? I'd love to hear you in the comments below.

Frequently Asked Questions about Voice Control Audio Players Using Web Voice API (FAQ)

How does the Web Voice API work in a voice-controlled audio player?

The Web Voice API is a powerful tool that allows developers to integrate speech recognition and synthesis into their web applications. In a voice-controlled audio player, the API works by converting spoken commands into text that the application can then interpret and execute. For example, if the user says "play", the API will convert it to text, and the application will understand that this is the command to start playing audio. This process involves sophisticated algorithms and machine learning techniques to accurately identify and interpret human speech.

What are the advantages of using voice-controlled audio players?

Voice-controlled audio players have several advantages. First, it provides a hands-free experience, which is especially useful when users are busy with other tasks. Second, it can enhance accessibility for users with reduced mobility, which may have difficulty using traditional controls. Finally, it offers a novel and engaging user experience that can make your app stand out from the competition.

Can I use the Voice API in any web browser?

Most modern web browsers support the Web Voice API, including Google Chrome, Mozilla Firefox, and Microsoft Edge. However, it is always best to check specific browser compatibility before integrating APIs into your application, as support may vary between versions and platforms.

How to improve the accuracy of speech recognition in voice-controlled audio players?

You can use high-quality microphones, reduce background noise, and train APIs to better understand the user's voice and accents to improve the accuracy of speech recognition. Additionally, you can implement error handling in your application to handle unidentified commands and provide feedback to users.

Can I customize voice commands in voice-controlled audio player?

Yes, you can customize voice commands in voice-controlled audio players. This can be done by defining your own set of commands in your application code, which the Web Voice API will then recognize and interpret. This allows you to customize the user experience based on your specific needs and preferences.

Does the Web Voice API support languages ​​other than English?

Yes, the Web Voice API supports multiple languages. You can specify a language in the API settings, and it will recognize and interpret commands for that language. This makes it a universal tool for developing applications for international audiences.

How is the security of the Web Voice API?

The Web Voice API is designed with security in mind. It uses a secure HTTPS connection to transmit voice data and does not store any personal information. However, like any web technology, it is important to follow security best practices, such as regularly updating software and protecting your applications from common web vulnerabilities.

Can I use the Web Voice API in my mobile application?

While the Voice Web API is primarily designed for use in web applications, it can also be used in mobile applications through web views. However, for native mobile applications, you may want to consider using platform-specific speech recognition APIs that may provide better performance and integration.

What are the limitations of the Web Voice API?

While the Web Voice API is a powerful tool, it does have some limitations. For example, it requires an internet connection to work, and its accuracy may be affected by factors such as background noise and user accent. Additionally, API support may vary between different web browsers and platforms.

How to get started with the Voice Web API?

To get started with the Web Voice API, you need to understand the basics of JavaScript and Web development. You can then browse the API documentation that provides detailed information about their features and how to use them. There are also many online tutorials and examples available to help you learn how to integrate APIs into your own applications.

The above is the detailed content of Make a Voice-Controlled Audio Player with the Web Speech API. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What should I do if I encounter garbled code printing for front-end thermal paper receipts? What should I do if I encounter garbled code printing for front-end thermal paper receipts? Apr 04, 2025 pm 02:42 PM

Frequently Asked Questions and Solutions for Front-end Thermal Paper Ticket Printing In Front-end Development, Ticket Printing is a common requirement. However, many developers are implementing...

Demystifying JavaScript: What It Does and Why It Matters Demystifying JavaScript: What It Does and Why It Matters Apr 09, 2025 am 12:07 AM

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

Who gets paid more Python or JavaScript? Who gets paid more Python or JavaScript? Apr 04, 2025 am 12:09 AM

There is no absolute salary for Python and JavaScript developers, depending on skills and industry needs. 1. Python may be paid more in data science and machine learning. 2. JavaScript has great demand in front-end and full-stack development, and its salary is also considerable. 3. Influencing factors include experience, geographical location, company size and specific skills.

Is JavaScript hard to learn? Is JavaScript hard to learn? Apr 03, 2025 am 12:20 AM

Learning JavaScript is not difficult, but it is challenging. 1) Understand basic concepts such as variables, data types, functions, etc. 2) Master asynchronous programming and implement it through event loops. 3) Use DOM operations and Promise to handle asynchronous requests. 4) Avoid common mistakes and use debugging techniques. 5) Optimize performance and follow best practices.

How to achieve parallax scrolling and element animation effects, like Shiseido's official website?
or:
How can we achieve the animation effect accompanied by page scrolling like Shiseido's official website? How to achieve parallax scrolling and element animation effects, like Shiseido's official website? or: How can we achieve the animation effect accompanied by page scrolling like Shiseido's official website? Apr 04, 2025 pm 05:36 PM

Discussion on the realization of parallax scrolling and element animation effects in this article will explore how to achieve similar to Shiseido official website (https://www.shiseido.co.jp/sb/wonderland/)...

The Evolution of JavaScript: Current Trends and Future Prospects The Evolution of JavaScript: Current Trends and Future Prospects Apr 10, 2025 am 09:33 AM

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

How to merge array elements with the same ID into one object using JavaScript? How to merge array elements with the same ID into one object using JavaScript? Apr 04, 2025 pm 05:09 PM

How to merge array elements with the same ID into one object in JavaScript? When processing data, we often encounter the need to have the same ID...

The difference in console.log output result: Why are the two calls different? The difference in console.log output result: Why are the two calls different? Apr 04, 2025 pm 05:12 PM

In-depth discussion of the root causes of the difference in console.log output. This article will analyze the differences in the output results of console.log function in a piece of code and explain the reasons behind it. �...

See all articles