Make a Voice-Controlled Audio Player with the Web Speech API
Core points
- Web Voice API is a JavaScript API that allows web developers to integrate speech recognition and synthesis into their web pages, thereby enhancing the user experience, especially for people with disabilities or users who need to handle multiple tasks simultaneously.
- Voice Recognition API Currently requires an internet connection and user permissions to access the microphone. Library such as Annyang can help manage complexity and ensure forward compatibility.
- You can use the speech synthesis API and the speech recognition API to build voice-controlled audio players. This allows the user to navigate between songs and request specific songs using voice commands.
- The audio player will contain settings data, UI methods, voice API methods, and audio operation methods. Codes that identify and process user input are only applicable to WebKit browsers.
- Web Voice API has the potential to be used in many areas, such as using voice commands to browse emails, navigate websites, or search for the web. With the implementation stable and new features added, the use of this API is expected to grow.
/ Used to hide/show extra blocks / .sp_hiddenblock { margin: 2px; border: 1px solid rgb(250, 197, 82); border-radius: 3px; padding: 5px; background-color: rgba(250, 197, 82, 0.7); } .sp_hiddenblock.sp_hide { display: none !important; } This article was reviewed by Edwin Reynoso and Mark Brown. Thanks to all SitePoint peer reviewers for getting SitePoint content to its best!
Web Voice API is a JavaScript API that enables web developers to integrate speech recognition and synthesis capabilities into their web pages.
There are many reasons for this. For example, to enhance the experience of people with disabilities (especially users with visual impairment, or users with limited hand mobility), or to allow users to interact with web applications while performing other tasks, such as driving.
If you have never heard of the Web Voice API, or you want to get started quickly, it might be a good idea to read Aurelio De Rosa's articles Introduction to the Web Voice API, Voice Synthesis API, and Talking Forms idea.
Browser support
Browser manufacturers have only recently begun to implement both the voice recognition API and the voice synthesis API. As you can see, support for these APIs is far from perfect, so if you are studying this tutorial, use the right browser.
In addition, the voice recognition API currently requires an internet connection because voice will be transmitted over the network and the result will be returned to the browser. If the connection uses HTTP, the user must allow the site to use their microphone every time the request is made. If the connection uses HTTPS, you only need to do this once.
Voice Recognition Library
Thelibrary helps us manage complexity and ensures we stay forward compatible. For example, when another browser starts supporting the voice recognition API, we don't have to worry about adding vendor prefixes.
Annyang is such a library, which is very easy to use. Learn more.
To initialize Annyang, we add their scripts to our website:
<🎜>
We can check if the API is supported like this:
if (annyang) { /*逻辑*/ }
and add a command using an object that uses the command name as the key and the callback as the method:
var commands = { 'show divs': function() { $('div').show(); }, 'show forms': function() { $("form").show(); } };
Finally, we just add them and start voice recognition with the following command:
annyang.addCommands(commands); annyang.start();
Voice-controlled audio player
In this article, we will build a voice-controlled audio player. We will use both the Speech Synthesis API (used to tell the user which song is being played, or the command is not recognized) and the Speech Recognition API (converting voice commands to strings that will trigger different application logic).
The advantage of using the audio player with the Web Voice API is that users can browse other pages in the browser or minimize the browser and perform other actions while still being able to switch between songs. If we have many songs on our playlist, we can even request a specific song without manual search (if we know its name or singer, of course).
We will not rely on third-party libraries for speech recognition, as we want to show how to use the API without adding additional dependencies to the project. Voice-controlled audio players only support browsers that support the interimResults
attribute. The latest version of Chrome should be a safe choice.
As always, you can find the full code on GitHub, as well as a demo on CodePen.
Beginner — Playlist
Let's start with a static playlist. It consists of an object that contains different songs in an array. Each song is a new object containing the path to the file, the singer's name, and the name of the song:
var data = { "songs": [ { "fileName": "https://www.ruse-problem.org/songs/RunningWaters.mp3", "singer" : "Jason Shaw", "songName" : "Running Waters" }, ...
We should be able to add new objects to the songs
array and automatically include new songs into our audio player.
Audio player
Now let's look at the player itself. This will be an object containing the following:
- Some setting data
- Methods related to UI (such as filling song lists)
- Methods related to Voice API (such as recognition and processing commands)
- Methods related to audio operation (e.g. play, pause, stop, previous, next)
Set data
This is relatively simple.
var audioPlayer = { audioData: { currentSong: -1, songs: [] },
currentSong
attribute refers to the index of the song the user is currently in. This is useful, for example, when we have to play the previous/next song or stop/pause song.
songs
Array contains all songs the user has listened to. This means that the next time the user listens to the same song, we can load it from the array without downloading it.
You can view the full code here.
UI method
UI will contain a list of available commands, a list of available tracks, and a context box to notify the user of the current action and previous commands. I won't go into detail about the UI method, but provide a brief overview. You can find the code for these methods here.
load
This will iterate over the playlist we declared earlier and append the song's name, along with the artist's name, to the list of available tracks.
changeCurrentSongEffect
This indicates which song is currently playing (by marking it in green and adding a pair of headphones next to it), and which songs have been played.
playSong
This indicates that the user's song is playing or ended through the changeStatusCode
method (adding this information to the box) and by notifying the user of this change through the voice API.
changeStatusCode
As mentioned above, this updates the status message in the context box (for example, indicating that a new song is being played) and uses the speak
method to notify the user of this change.
changeLastCommand
A small helper function to update the last command box.
toggleSpinner
A small helper function to hide or display the spinner icon (indicating that the user's voice command is currently being processed).
Player method
The player will be responsible for what you might expect, namely: starting, stopping, and pausing playback, and moving back and forth between tracks. Again, I'm not going to go into these methods in detail, but rather I want to direct you to our GitHub code base.
Play
This checks whether the user has listened to the song. If not, it starts the song, otherwise it will only call the playSong
method we discussed earlier on the currently cached song. This is in audioData.songs
and corresponds to the currentSong
index.
pauseSong
This pauses or stops completely (returns playback time to the beginning of the song) a song, depending on what is passed as the second parameter. It also updates the status code to notify the user that the song has been stopped or paused.
stop
This pauses or stops the song based on its first and only parameter:
prev
This checks if the previous song is cached, and if so, pauses the current song, decrements currentSong
and plays the current song again. If the new song is not in the array, it does the same thing, but it first loads the song based on the file name/path corresponding to the decreasing currentSong
index.
next
If the user has listened to a song before, this method will try to pause it. If the next song exists in our data object (i.e. our playlist), it loads and plays it. If there is no next song, it will just change the status code and inform the user that they have reached the last song.
searchSpecificSong
This takes the keyword as a parameter and performs a linear search between the song name and the artist, then plays the first match.
Voice API Method
The Voice API is surprisingly easy to implement. In fact, just two lines of code can make the web application talk to the user:
<🎜>
What we do here is create a utterance
object with the text we want to say. The speechSynthesis
interface (available on the window
object) is responsible for handling this utterance
object and controlling the playback of the generated voice.
Continue to try it in your browser. It's that simple!
speak
We can see its practical application in our speak
method, which reads aloud the message passed as a parameter:
if (annyang) { /*逻辑*/ }
If a second parameter (scope
) exists, after the message is played, we call the scope
method on play
(which will be an Audio object).
processCommands
This method is not that exciting. It takes a command as an argument and calls the appropriate method to respond to it. It uses a regular expression to check if the user wants to play a specific song, otherwise, it goes into a switch statement to test different commands. If none corresponds to the received command, it informs the user that the command is not understood.
You can find its code here.
Tele everything together
So far, we have a data object representing the playlist, and a audioPlayer
object representing the player itself. Now we need to write some code to identify and process user input. Please note that this applies only to WebKit browsers.
The code that makes the user talk to your app as simple as before:
var commands = { 'show divs': function() { $('div').show(); }, 'show forms': function() { $("form").show(); } };
This will invite users to allow pages to access their microphone. If you allow access, you can start talking and when you stop, the onresult
event will be triggered to make the result of the voice capture available as a JavaScript object.
Reference: HTML5 Speech Recognition API
We can implement it in our application as follows:
annyang.addCommands(commands); annyang.start();
As you can see, we tested the presence of webkitSpeechRecognition
on the window
object. If it exists, then we can start, otherwise we will tell the user that the browser does not support it. If all goes well, we then set a few options. Among them lang
is an interesting option that improves the recognition results based on your origin.
Then, we declare handlers for the start
and onresult
events before starting the onend
method.
Processing results
When the speech recognizer gets results, at least in the context of the current speech recognition implementation and our needs, we want to do a few things. Every time there is a result, we want to save it in the array and set a timeout to wait for three seconds so that the browser can collect any further results. After three seconds we want to use the collected results and loop through them in reverse order (newer results are more likely to be accurate) and check if the identified transcripts contain one of the commands we have available. If so, we execute the command and restart voice recognition. We do this because it can take up to a minute to wait for the end result, which makes our audio player look rather unresponsive and meaningless because it will be faster with just a click of a button.
<🎜>
Because we don't use the library, we have to write more code to set up our speech recognizer, loop through each result and check if its transcription matches the given keyword.
Finally, we restart it immediately at the end of speech recognition:
if (annyang) { /*逻辑*/ }
You can view the full code for this section here.
That's it. We now have a fully functional and voice-controlled audio player. I highly recommend you download the code from GitHub and try it out, or check out the CodePen demo. I also provide a version that serves over HTTPS.
Conclusion
I hope this practical tutorial will provide a good introduction to the possibilities of the Web Voice API. I think as the implementation stabilizes and new features are added, we will see the usage of this API grow. For example, I think future YouTube will be completely voice-controlled, where we can watch videos from different users, play specific songs, and move between songs with just voice commands.
The Web Voice API can also improve many other areas or open up new possibilities. For example, use voice to browse emails, navigate websites, or search for the network.
Do you use this API in your project? I'd love to hear you in the comments below.
Frequently Asked Questions about Voice Control Audio Players Using Web Voice API (FAQ)
How does the Web Voice API work in a voice-controlled audio player?
The Web Voice API is a powerful tool that allows developers to integrate speech recognition and synthesis into their web applications. In a voice-controlled audio player, the API works by converting spoken commands into text that the application can then interpret and execute. For example, if the user says "play", the API will convert it to text, and the application will understand that this is the command to start playing audio. This process involves sophisticated algorithms and machine learning techniques to accurately identify and interpret human speech.
What are the advantages of using voice-controlled audio players?
Voice-controlled audio players have several advantages. First, it provides a hands-free experience, which is especially useful when users are busy with other tasks. Second, it can enhance accessibility for users with reduced mobility, which may have difficulty using traditional controls. Finally, it offers a novel and engaging user experience that can make your app stand out from the competition.
Can I use the Voice API in any web browser?
Most modern web browsers support the Web Voice API, including Google Chrome, Mozilla Firefox, and Microsoft Edge. However, it is always best to check specific browser compatibility before integrating APIs into your application, as support may vary between versions and platforms.
How to improve the accuracy of speech recognition in voice-controlled audio players?
You can use high-quality microphones, reduce background noise, and train APIs to better understand the user's voice and accents to improve the accuracy of speech recognition. Additionally, you can implement error handling in your application to handle unidentified commands and provide feedback to users.
Can I customize voice commands in voice-controlled audio player?
Yes, you can customize voice commands in voice-controlled audio players. This can be done by defining your own set of commands in your application code, which the Web Voice API will then recognize and interpret. This allows you to customize the user experience based on your specific needs and preferences.
Does the Web Voice API support languages other than English?
Yes, the Web Voice API supports multiple languages. You can specify a language in the API settings, and it will recognize and interpret commands for that language. This makes it a universal tool for developing applications for international audiences.
How is the security of the Web Voice API?
The Web Voice API is designed with security in mind. It uses a secure HTTPS connection to transmit voice data and does not store any personal information. However, like any web technology, it is important to follow security best practices, such as regularly updating software and protecting your applications from common web vulnerabilities.
Can I use the Web Voice API in my mobile application?
While the Voice Web API is primarily designed for use in web applications, it can also be used in mobile applications through web views. However, for native mobile applications, you may want to consider using platform-specific speech recognition APIs that may provide better performance and integration.
What are the limitations of the Web Voice API?
While the Web Voice API is a powerful tool, it does have some limitations. For example, it requires an internet connection to work, and its accuracy may be affected by factors such as background noise and user accent. Additionally, API support may vary between different web browsers and platforms.
How to get started with the Voice Web API?
To get started with the Web Voice API, you need to understand the basics of JavaScript and Web development. You can then browse the API documentation that provides detailed information about their features and how to use them. There are also many online tutorials and examples available to help you learn how to integrate APIs into your own applications.
The above is the detailed content of Make a Voice-Controlled Audio Player with the Web Speech API. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Frequently Asked Questions and Solutions for Front-end Thermal Paper Ticket Printing In Front-end Development, Ticket Printing is a common requirement. However, many developers are implementing...

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

There is no absolute salary for Python and JavaScript developers, depending on skills and industry needs. 1. Python may be paid more in data science and machine learning. 2. JavaScript has great demand in front-end and full-stack development, and its salary is also considerable. 3. Influencing factors include experience, geographical location, company size and specific skills.

Learning JavaScript is not difficult, but it is challenging. 1) Understand basic concepts such as variables, data types, functions, etc. 2) Master asynchronous programming and implement it through event loops. 3) Use DOM operations and Promise to handle asynchronous requests. 4) Avoid common mistakes and use debugging techniques. 5) Optimize performance and follow best practices.

Discussion on the realization of parallax scrolling and element animation effects in this article will explore how to achieve similar to Shiseido official website (https://www.shiseido.co.jp/sb/wonderland/)...

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

How to merge array elements with the same ID into one object in JavaScript? When processing data, we often encounter the need to have the same ID...

In-depth discussion of the root causes of the difference in console.log output. This article will analyze the differences in the output results of console.log function in a piece of code and explain the reasons behind it. �...
