


Implementing a conversational robot with Raspberry Pi_PHP tutorial
Using Raspberry Pi to implement a conversational robot
Recently, I used Raspberry Pi to implement a robot that can talk to people. Let’s give a brief introduction.Raspberry Pi is the world's most popular microcomputer motherboard and a leader in open source hardware. It is designed for student computer programming education, is only the size of a credit card, and is affordable. Supports operating systems such as linux (debian). The most important thing is that the information is complete and the community is active.
I am using Raspberry Pi B version. The basic configuration is Broadcom BCM2836 processor, 4-core 900M clock speed, and 1G RAM.
My goal is to make a robot that can talk to people, which requires the robot to have input devices and output devices. The input device is a microphone, and the output can be HDMI, headphones or speakers. I used speakers here. Below is a photo of my Raspberry Pi. The 4 USB interfaces are respectively connected to wireless network cards, wireless keyboards, microphones, and audio power supplies.

We can divide the robot’s conversation into three parts: listening, thinking, and speaking.
"Listening" means recording what people say and converting it into words.
"Thinking" means giving different outputs based on different inputs. For example, if the other party says "it's time now", you can reply "it's xx o'clock xx minutes Beijing time".
"Speak" means converting text into speech and playing it back.
These three parts involve a lot of speech recognition, speech synthesis, artificial intelligence and other technologies, which require a lot of time and effort to research. Fortunately, some companies have opened interfaces for customers to use. Here, I chose Baidu’s API. The implementation of these three parts is explained below.
"Listen"
The first thing is to record what people say. I used the arecord tool. The command is as follows:
- arecord -D "plughw:1" -f S16_LE -r 16000 test.wav
Next, we need to convert the audio into text, that is, speech recognition (asr). Baidu's voice open platform provides free services and supports REST API
For documentation, see: http://yuyin.baidu. com/docs/asr/57
The process is basically to obtain the token, send the voice information, voice data, token, etc. that need to be recognized to Baidu's speech recognition server, and then the corresponding text can be obtained. Because the server supports REST API, we can use any language to implement the client code. Here I use python
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li># coding: utf-8<br /> </li><li><br /></li><li>import urllib.request<br /></li><li>import json<br /></li><li>import base64<br /></li><li>import sys<br /></li><li><br /></li><li>def get_access_token():<br /></li><li>url = "https://openapi.baidu.com/oauth/2.0/token"<br /></li><li>grant_type = "client_credentials"<br /></li><li>client_id = "xxxxxxxxxxxxxxxxxx"<br /></li><li>client_secret = "xxxxxxxxxxxxxxxxxxxxxx"<br /></li><li><br /></li><li>url = url + "?" + "grant_type=" + grant_type + "&" + "client_id=" + client_id + "&" + "client_secret=" + client_secret<br /></li><li><br /></li><li>resp = urllib.request.urlopen(url).read()<br /></li><li>data = json.loads(resp.decode("utf-8"))<br /></li><li>return data["access_token"]<br /></li><li><br /></li><li><br /></li><li>def baidu_asr(data, id, token):<br /></li><li>speech_data = base64.b64encode(data).decode("utf-8")<br /></li><li>speech_length = len(data)<br /></li><li><br /></li><li>post_data = {<br /></li><li>"format" : "wav",<br /></li><li>"rate" : 16000,<br /></li><li>"channel" : 1,<br /></li><li>"cuid" : id,<br /></li><li>"token" : token,<br /></li><li>"speech" : speech_data,<br /></li><li>"len" : speech_length<br /></li><li>}<br /></li><li><br /></li><li>url = "http://vop.baidu.com/server_api"<br /></li><li>json_data = json.dumps(post_data).encode("utf-8")<br /></li><li>json_length = len(json_data)<br /></li><li>#print(json_data)<br /></li><li><br /></li><li>req = urllib.request.Request(url, data = json_data)<br /></li><li>req.add_header("Content-Type", "application/json")<br /></li><li>req.add_header("Content-Length", json_length)<br /></li><li><br /></li><li>print("asr start request\n")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("asr finish request\n")<br /></li><li>resp = resp.read()<br /></li><li>resp_data = json.loads(resp.decode("utf-8"))<br /></li><li>if resp_data["err_no"] == 0:<br /></li><li>return resp_data["result"]<br /></li><li>else:<br /></li><li>print(resp_data)<br /></li><li>return None<br /></li><li><br /></li><li>def asr_main(filename):<br /></li><li>f = open(filename, "rb")<br /></li><li>audio_data = f.read()<br /></li><li>f.close()<br /></li><li><br /></li><li>#token = get_access_token()<br /></li><li>token = "xxxxxxxxxxxxxxxxxx"<br /></li><li>uuid = "xxxx"<br /></li><li>resp = baidu_asr(audio_data, uuid, token)<br /></li><li>print(resp[0])<br /></li><li>return resp[0] </li></ol>
"Thinking"
Here I use Turing from Baidu api store robot. Its documentation can be found at: http://apistore.baidu.com/apiworks/servicedetail/736.html
Its use is very simple and will not be described in detail here. The code is as follows:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>import urllib.request<br /> </li><li>import sys<br /></li><li>import json<br /></li><li><br /></li><li>def robot_main(words):<br /></li><li>url = "http://apis.baidu.com/turing/turing/turing?"<br /></li><li><br /></li><li>key = "879a6cb3afb84dbf4fc84a1df2ab7319"<br /></li><li>userid = "1000"<br /></li><li><br /></li><li>words = urllib.parse.quote(words)<br /></li><li>url = url + "key=" + key + "&info=" + words + "&userid=" + userid<br /></li><li><br /></li><li>req = urllib.request.Request(url)<br /></li><li>req.add_header("apikey", "xxxxxxxxxxxxxxxxxxxxxxxxxx")<br /></li><li><br /></li><li>print("robot start request")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("robot stop request")<br /></li><li>content = resp.read()<br /></li><li>if content:<br /></li><li>data = json.loads(content.decode("utf-8"))<br /></li><li>print(data["text"])<br /></li><li>return data["text"]<br /></li><li>else:<br /></li><li>return None</li></ol>
"Speaking"
first needs to convert text into speech, that is, speech synthesis (tts). Then play the sound.
Baidu's voice open platform provides a tts interface, and can configure male and female voices, intonation, speaking speed, and volume. The server returns audio data in mp3 format. We write the data to the file in binary format.
For details, see http://yuyin.baidu.com/docs/tts/136
The code is as follows:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li># coding: utf-8<br /> </li><li><br /></li><li>import urllib.request<br /></li><li>import json<br /></li><li>import sys<br /></li><li><br /></li><li>def baidu_tts_by_post(data, id, token):<br /></li><li>post_data = {<br /></li><li>"tex" : data,<br /></li><li>"lan" : "zh",<br /></li><li>"ctp" : 1,<br /></li><li>"cuid" : id,<br /></li><li>"tok" : token,<br /></li><li>}<br /></li><li><br /></li><li>url = "http://tsn.baidu.com/text2audio"<br /></li><li>post_data = urllib.parse.urlencode(post_data).encode('utf-8')<br /></li><li>#print(post_data)<br /></li><li>req = urllib.request.Request(url, data = post_data)<br /></li><li><br /></li><li>print("tts start request")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("tts finish request")<br /></li><li>resp = resp.read()<br /></li><li>return resp<br /></li><li><br /></li><li>def tts_main(filename, words):<br /></li><li>token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"<br /></li><li>text = urllib.parse.quote(words)<br /></li><li>uuid = "xxxx"<br /></li><li>resp = baidu_tts_by_post(text, uuid, token)<br /></li><li><br /></li><li>f = open("test.mp3", "wb")<br /></li><li>f.write(resp)<br /></li><li>f.close() </li></ol>
After getting the audio file, you can use the mpg123 player to play it.
- mpg123 test.mp3
Integration
Finally, combine these three parts.
You can first integrate the python-related code into main.py, as follows:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>import asr<br /> </li><li>import tts<br /></li><li>import robot<br /></li><li><br /></li><li>words = asr.asr_main("test.wav")<br /></li><li>new_words = robot.robot_main(words)<br /></li><li>tts.tts_main("test.mp3", new_words) </li></ol>
Then use the script to call related tools:
- #! /bin/bash
- arecord -D "plughw:1" -f S16_LE -r 16000 test.wav
- python3 main.py
- mpg123 test .mp3
Okay, now you can talk to the robot. Run the script, say something into the microphone, then press ctrl-c, and the robot will reply to you.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

In recent days, Ice Universe has been steadily revealing details about the Galaxy S25 Ultra, which is widely believed to be Samsung's next flagship smartphone. Among other things, the leaker claimed that Samsung only plans to bring one camera upgrade

OnLeaks has now partnered with Android Headlines to provide a first look at the Galaxy S25 Ultra, a few days after a failed attempt to generate upwards of $4,000 from his X (formerly Twitter) followers. For context, the render images embedded below h

Alongside announcing two new smartphones, TCL has also announced a new Android tablet called the NXTPAPER 14, and its massive screen size is one of its selling points. The NXTPAPER 14 features version 3.0 of TCL's signature brand of matte LCD panels

The Vivo Y300 Pro just got fully revealed, and it's one of the slimmest mid-range Android phones with a large battery. To be exact, the smartphone is only 7.69 mm thick but features a 6,500 mAh battery. This is the same capacity as the recently launc

Samsung has not offered any hints yet about when it will update its Fan Edition (FE) smartphone series. As it stands, the Galaxy S23 FE remains the company's most recent edition, having been presented at the start of October 2023. However, plenty of

In recent days, Ice Universe has been steadily revealing details about the Galaxy S25 Ultra, which is widely believed to be Samsung's next flagship smartphone. Among other things, the leaker claimed that Samsung only plans to bring one camera upgrade

The Redmi Note 14 Pro Plus is now official as a direct successor to last year'sRedmi Note 13 Pro Plus(curr. $375 on Amazon). As expected, the Redmi Note 14 Pro Plus heads up the Redmi Note 14 series alongside theRedmi Note 14and Redmi Note 14 Pro. Li

OnePlus'sister brand iQOO has a 2023-4 product cycle that might be nearlyover; nevertheless, the brand has declared that it is not done with itsZ9series just yet. Its final, and possibly highest-end,Turbo+variant has just beenannouncedas predicted. T
