


Siri is becoming more and more 'popular'. What breakthroughs will there be in smart voice in the future?
For human-computer interaction, how to make machines have good hearing has been a goal pursued unremittingly in the field of AI in recent years. Around 2009, deep learning model applications began to leave the academic world, and intelligent speech technology represented by speech awakening, recognition, enhancement, and synthesis also gradually matured.
A typical early example is the birth of siri in 2011. Intelligent voice has become a new leap in the way of communication and interaction between humans and machines. After more than ten years of development, "Hey, Siri"-style human-machine question and answer is no longer limited to mobile terminal devices, has entered thousands of households, and is widely used in various scenarios: smart speakers for home companions, and Tmall Genie for convenient online shopping. , simultaneous translation at meetings, car voice navigation assistants when traveling, etc.
As more and more Internet companies and upstream manufacturers actively deploy in the intelligent voice track, products such as intelligent voice customer service, conversational AI applications, and AI virtual assistants have achieved great success. With further quality improvement, the response voice is more natural, the understanding of questions is more accurate, and it has its own "little emotions".
In the era of digitalization, the trend of interconnection of everything is unstoppable. Intelligent voice, as the key interface for current human-computer interaction, is in a period of deep integration and collision with the real economy. With the further development and expansion of application scenarios, we have also seen many challenging problems, such as: how to identify the speaker's identity, how to identify dialects, how to eliminate ambiguity, etc. are the latest research hotspots.
Behind the maturity of a technology, there is often some potential, including its innovative ability in practical applications and its more potential evolution direction. Looking to the next stage, intelligent voice technology will also see new evolution trends. For example: Can deeply integrated AI voice chips replace the cloud model running model? Can innovative research on multi-modal fusion, unsupervised learning, and cross-integration of brain disciplines achieve breakthrough results? We'll see.
So, what real production problems have been encountered in the practical exploration of intelligent voice technology in major enterprises? How was it solved? What progress has been made? What new changes have occurred in the industry? What are the next development trends? The "AISummit Global Artificial Intelligence Technology Conference" intelligent voice technology special session will bring you in-depth thinking!
On August 7th, the “AISummit Global Artificial Intelligence Technology Conference” dedicated to intelligent voice created by 51CTO is coming!
What special topics are you interested in?
Topic 1: Zuoyebang Speech Technology Practice
1. Exploration of speech recognition technology: Share speech recognition technology in large-scale practical application scenarios such as end-to-end, efficient use of data, etc. And a hot word technical solution based on prefix automata was proposed.
2. Speech evaluation technology practice: In terms of speech pronunciation error correction technology, combined with the high-concurrency scenario of homework help, a multi-task knowledge transfer and multi-modal feature fusion solution is proposed, which is very significant. To a certain extent, the model's factor discrimination ability and error detection ability in a noisy environment are improved. In view of the difficulty in implementing voice evaluation, a high-performance cloud-based integrated evaluation technology was proposed.
3. Speech synthesis technology framework: Share the thoughts and practices of Zuoyebang on further improvements based on the existing small data volume speech technology framework.
Topic 2: Application of byte speech recognition technology in Feishu
1. Application process of speech recognition technology in office scenarios: office emails, instant messaging Voice input in office voice assistant, real-time subtitles & post-meeting transcription.
2. Solution thinking: Make meetings intelligent and improve efficiency.
3. Challenges and opportunities: Challenges of speech recognition tasks, challenges brought by downstream tasks, and meetings provide additional information.
4. Introduction to key algorithm work (end-to-end speech recognition system): Transducer & CIF, dynamic and static hot words, Context-aware.
Topic 3: Practice of building a high-level speech synthesis system
1. Background introduction and problem analysis of high-level speech synthesis system.
2. Design thinking and implementation of high-level speech synthesis system.
3. Experimental evaluation.
4. Future work prospects.
Topic 4: The path to practical implementation of intelligent voice technology in SOUL social scenarios
1. End-to-end speech recognition in SOUL social metaverse scenarios
2. Construction route of multi-modal speech synthesis technology
3. Application in business scenarios such as voice security and voice interaction
Topic 5 : The exploration and practice of end-to-end speech recognition technology in 58.com
1. Application scenarios of speech recognition in 58.com: AI intelligent voice application, speech recognition link introduction, challenges and technical routes
2. Model optimization work based on WeNet: semi-supervised training, Efficient Conformer, model compression
3. End-to-end speech recognition deployment plan :What are the important guests in the self-developed engine architecture, Wenet decoding service deployment, and streaming/non-streaming decoding performance testing
?
1. Song Yang, chief algorithm expert, head of intelligent middle office, and special producer of Zuoyebang
Song Yang has worked at Baidu for 7 years and is engaged in algorithm research and development. Joined Zuoyebang in 2015 as the head of the intelligent middle office department, providing middle office technical capabilities including data mining, NLP, and voice for the company's various businesses. He has been responsible for search and Q&A, personalized recommendations, intelligent quality inspection, voice evaluation, Intelligent service scheduling and other directions.
2. Wang Qiangqiang, head of the speech technology team of Zuoyebang
Before joining Zuoyebang, Wang Qiangqiang worked at the Department of Electronic Engineering, Tsinghua University, in Speech Processing and Machinery The intelligent laboratory is responsible for implementing speech recognition algorithms and building industrial-grade solutions. Joined Zuoyebang in 2018 and is responsible for the research and implementation of speech-related algorithms. He has led the implementation of speech recognition, evaluation, synthesis and other algorithms in Zuoyebang, providing the company with a complete set of voice technology solutions.
3. Zhang Jun, speech recognition algorithm researcher at ByteDance AI Lab
Zhang Jun has long been engaged in the research and application of speech algorithms such as speech recognition and voice wake-up, and has rich experience. . In 2018, he joined the ByteDance AI Lab intelligent voice team and is currently mainly responsible for the construction of voice technology solutions in the areas of intelligent office, intelligent hardware, and intelligent customer service.
4. Tan Xu, Researcher in Charge of Microsoft Research Asia
Tan Xu’s research fields include deep learning, natural language/speech/music, AI content generation, etc. The machine translation and speech synthesis system developed has won multiple competition championships and reached human level in academic evaluation sets. Research work such as pre-training language model MASS, speech synthesis model FastSpeech/NaturalSpeech, and AI music project Muzic have received widespread attention in the industry.
5. Liu Zhongliang, head of SOUL speech algorithm
Liu Zhongliang graduated from the Graduate School of the Chinese Academy of Sciences with a master's degree. He currently serves as the head of speech algorithm at SOUL. He once worked at Sogou AI Interaction Department and Momo Big Data Department. In the past 10 years, he has been mainly engaged in the research and development of speech technology systems such as voice wake-up, speech recognition, speech synthesis, and audio music understanding. It is mainly used in voice interaction and speech understanding business scenarios such as input methods, mobile assistants, smart hardware, and voice security. He is committed to Create the best implementable voice technology.
6. Zhou Wei, head of the speech algorithm department and algorithm architect of 58.com AI Lab
Zhou Wei, head of the speech algorithm department and algorithm of 58.com AI Lab Architect, responsible for speech recognition and speech synthesis algorithm development. Graduated with a master's degree from the University of Chinese Academy of Sciences in 2016. After graduation, he participated in entrepreneurship in the direction of conversational AI products. In May 2018, he joined 58.com and has participated in the research and development of NLP algorithms for AI projects such as intelligent customer service, intelligent outbound calls, and intelligent writing. In 2019 He began to focus on the direction of speech algorithms and led the team to independently develop the speech algorithm in the 58 city speech processing engine from 0 to 1.
What other exciting activities are there?
In addition to the wonderful sharing of practical innovations by wonderful AI technology experts, the AISummit Global Artificial Intelligence Technology Conference also prepared a wealth of pre-site and in-site interactive benefits for attendees. Join this event, expand your technical capabilities and network resources, and take home surprise gifts at the same time!
The event includes four interesting interactive games such as "Don't give in", "Work with luck", and "Wise and share the same goals". There will always be an exquisite gift to surprise you! Then, the legendary and mysterious ultimate What will be the grand prize? Waiting for you who love technology to come and reveal the secret on site! (PS: I heard that the earlier you make an appointment to register, the higher your chance of winning the grand prize!)
How to make an appointment quickly?
Click to enter the official website of the AISummit Global Artificial Intelligence Technology Conference. Follow the prompts to completely fill in and submit the information to complete the registration. Scan the QR code to join the official group of the conference, participate in the lottery, and win exquisite gifts such as SONY speakers, Bingdundun, and AI technology books, as well as red envelopes.
The above is the detailed content of Siri is becoming more and more 'popular'. What breakthroughs will there be in smart voice in the future?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Author | Compiled by Liu Zhongliang | Reviewed by Lu Xinwang | Yun Zhao In recent years, intelligent voice language technology has developed vigorously, gradually changing the way people work and live, and has put forward higher requirements for intelligent voice technology in the social field. Recently, at the AISummit Global Artificial Intelligence Technology Conference hosted by 51CTO, Liu Zhongliang, the head of Soul's speech algorithm, gave a keynote speech "The Practical Path of Soul Intelligent Speech Technology". Based on some of Soul's business scenarios, he shared Soul's application in intelligent speech. Some practical experience with technology. The content of the speech is now organized as follows, hoping to inspire everyone. Soul's voice application scenario Soul is an immersive social scenario recommended based on interest graphs. In this scenario

Practical experience and skills sharing between Python and Baidu Intelligent Speech Interface 1. Introduction Baidu Intelligent Speech Interface is a powerful speech recognition technology that can convert speech into corresponding text and supports voice input in a variety of scenarios, such as microphones Input, file input, etc. In actual development, docking with Baidu's intelligent voice interface can help us realize functions such as speech recognition and speech transcription. This article will share some practical experience and skills in connecting Python with Baidu intelligent voice interface, and provide code examples for reference. two

1. First click to enter [Settings]. 2. Find and click to open [Intelligent Assistance]. 3. Click to enter [Gesture Control]. 4. Then click [Voice Assistant]. 5. Turn on the switch on the right side of [Power button to wake up], and press and hold the power button for 1 second to wake up the voice assistant.

Author | Yunzhao Between users and information, there is either a search or a recommendation. As Baidu Executive Vice President Shen Dou said at a conference: People are so familiar with search that they cannot feel the technological changes. Today, search is everywhere, from browsers, WeChat, Alipay, to other apps that we log in to and use every day. We are accustomed to using search to filter the information we need. "Search" has become a basic technology in the Internet era. It no longer has a "sense of presence" in our sights like new technologies such as blockchain and Web3. What is really important is often what we take for granted but cannot perceive. In the big data era where the amount of information is exploding, traditional search has also evolved into the era of intelligent search. have

In midsummer in August, the sun is scorching like fire and the vegetation is lush, everything shows the wild and poetic vitality. Summer is a season of exploration, growth, and innovation. In this season that belongs to practitioners, 51CTO brought an AI event with the theme of "Drive·Innovation·Digital Intelligence". AI technology was born less than a century ago. After several ups and downs, it has ushered in a golden period of comprehensive development and implementation in the past 20 years. What are the current cutting-edge technological achievements and practical innovation breakthroughs in the field of AI? How do you view the next decade of AI? This is an issue lingering in the minds of many technicians. On August 6, the AISummit Global Artificial Intelligence Technology Conference opened as scheduled with an online live broadcast. On the first day, nearly a hundred experts, scholars, technical experts, and management elites gathered together

AI technology was born less than a century ago. After experiencing several cold winters, it has achieved rapid development in the past 20 years. Artificial intelligence is gradually occupying fields such as finance, information, medical care, and autonomous driving. In this golden period of comprehensive development and implementation of artificial intelligence technology, how should we view the next decade of AI? What innovations will appear in AI technology in the future? The "AISummit Global Artificial Intelligence Technology Conference" organized by 51CTO will reveal the answer for you. On August 6, the "AISummit Global Artificial Intelligence Technology Conference" opened as scheduled in the form of an online live broadcast. In the main venue in the morning, Cui Kang, vice president and editor-in-chief of 51CTO, Dou Zhicheng, deputy dean of Hillhouse School of Artificial Intelligence at Renmin University of China, Met

As the wave of digital transformation advances, the demand for distributed and decentralized AI models and algorithms has become increasingly prominent, and the organic combination of different algorithms and models has become a mainstream choice in practical applications. In addition, multi-modality, unsupervised, interpretability, self-learning, self-evolution, etc. are all research directions that need to be focused on in the current AI field. So, what new developments have been made in these "soul" features in the field of AI? How do major domestic and foreign AI giants maximize model performance in actual implementation? If you want to understand the development and cutting-edge exploration of artificial intelligence algorithm models, the AISummit "Innovation of Algorithm Models" special session is not to be missed! Summit Special Session On August 6th and 7th, the AISummit Global Artificial Intelligence Technology Conference will

On July 26, Meta released its first Metaverse white paper, which predicted the impact of Metaverse technology on the global economy based on the development of mobile devices. It is estimated that the Yuanverse market will reach US$800 billion-2000 billion in the next few years. Returning to the technical level, the metaverse is not an object or space, but the way different technical components are related to each other, including: AR, VR, MR, blockchain, NFT, etc. It will break the Internet's dependence on equipment and geographical location, and create an online experience for people to participate in without being physically present through an immersive, natural and seamless experience. Immersion, comfort, and interoperability will be the development trend of the Metaverse experience. And this kind of experience innovation will not only become a new catalyst for the Internet industry,
