1 Outstanding, 5 Oral! Is ByteDance's ACL so fierce this year? Come and chat in the live broadcast room!-AI-php.cn

Table of Contents

Looking forward to your interactive questions

Home

Technology peripherals

1 Outstanding, 5 Oral! Is ByteDance's ACL so fierce this year? Come and chat in the live broadcast room!

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 15, 2024 pm 04:32 PM

ByteDance industry ACL 2024 Bean bag model

The focus of academic circles this week is undoubtedly the ACL 2024 Summit held in Bangkok, Thailand. This event attracted many outstanding researchers from around the world, who gathered together to discuss and share the latest academic results.

Official data shows that this year’s ACL received nearly 5,000 paper submissions, 940 of which were accepted by the main conference, and 168 works were selected for the oral report (Oral) of the conference. The acceptance rate is less than 3.4%. Among them, ByteDance has a total of 5 results and Oral was selected.

In the Paper Awards session on the afternoon of August 14th, ByteDance’s achievement "G-DIG: Towards Gradient-based DIverse and high-quality Instruction Data Selection for Machine Translation" was officially announced by the organizer as the Outstanding Paper (1/ 35).现 ACL 2024 on -site photos

1篇Outstanding、5篇Oral！字节跳动今年ACL这么猛？来直播间聊聊！ Back to ACL 2021, byte beating has taken the only best papers with laurel. It is the second time that the Chinese scientist team has been picked for the second time since the establishment of ACL. Top prize!

In order to have an in-depth discussion of this year’s cutting-edge research results, we specially invited the core workers of ByteDance’s paper to interpret and share. Next Tuesday, August 20, from 19:00-21:00, the "ByteDance ACL 2024 Cutting-edge Paper Sharing Session" will be broadcast online!

Wang Mingxuan

, leader of the Doubao language model research team, will join hands with many ByteDance researchers

Huang Zhichao, Zheng Zaixiang, Li Chaowei, Zhang Xinbo, and Outstanding Paper’s mysterious guest

to share some of the exciting results and research directions of ACL It involves natural language processing, speech processing, multi-modal learning, large model reasoning and other fields. Welcome to make an appointment!

Event Agenda

Interpretation of Selected Papers

1篇Outstanding、5篇Oral！字节跳动今年ACL这么猛？来直播间聊聊！

RepCodec: A Speech Representation Codec for Speech Discretization Device 1篇Outstanding、5篇Oral！字节跳动今年ACL这么猛？来直播间聊聊！

Paper address : https://arxiv.org/pdf/2309.00169

With the recent rapid development of large language models (LLMs), discrete speech tokenization plays an important role in injecting speech into LLMs. However, this discretization leads to a loss of information, thus harming the overall performance. To improve the performance of these discrete speech tokens, we propose RepCodec, a novel speech representation codec for semantic speech discretization.
^{Framework of RepCodec}

Unlike audio codecs that reconstruct the original audio, RepCodec learns the VQ codebook by reconstructing the speech representation from a speech encoder such as HuBERT or data2vec. The speech encoder, codec encoder, and VQ codebook together form a process that converts speech waveforms into semantic tokens. Extensive experiments show that RepCodec significantly outperforms the widely used k-means clustering method in speech understanding and generation due to its enhanced information retention capabilities. Furthermore, this advantage holds across a variety of speech coders and languages, affirming the robustness of RepCodec. This approach can facilitate large-scale language model research in speech processing.
DINOISER: Diffusion conditional sequence generation model enhanced by noise manipulation
Paper address: https://arxiv.org/pdf/2302.10025
While the diffusion model is generating Great success has been achieved with continuous signals such as images and audio, but difficulties remain in learning discrete sequence data like natural language. Although a recent series of text diffusion models circumvent this challenge of discreteness by embedding discrete states into a continuous state latent space, their generation quality is still unsatisfactory.

To understand this, we first deeply analyze the training process of sequence generation models based on diffusion models and identify three serious problems with them: (1) learning failure; (2) lack of scalability; (3) neglect condition signal. We believe that these problems can be attributed to the imperfection of discreteness in the embedding space, where the scale of the noise plays a decisive role.

In this work, we propose DINOISER, which enhances diffusion models for sequence generation by manipulating noise. We adaptively determine the range of the sampled noise scale during the training phase in a manner inspired by optimal transmission, and encourage the model during the inference phase to better exploit the conditional signal by amplifying the noise scale. Experiments show that based on the proposed effective training and inference strategy, DINOISER outperforms the baseline of previous diffusion sequence generation models on multiple conditional sequence modeling benchmarks. Further analysis also verified that DINOISER can better utilize conditional signals to control its generation process.

Accelerate the training of visual conditional language generation by reducing redundancy
Paper address: https://arxiv.org/pdf/2310.03291

We introduce EVLGen, a tool for A simplified framework designed for the pre-training of visually conditional language generation models with high computational requirements, leveraging frozen pre-trained large language models (LLMs).
^{Overview of the EVLGen}

The conventional approach in visual language pretraining (VLP) usually involves a two-stage optimization process: an initial resource-intensive stage dedicated to the general visual language Representation learning focuses on extracting and integrating relevant visual features. This is followed by a follow-up phase emphasizing end-to-end alignment between visual and language modalities. Our novel single-stage, single-loss framework bypasses the computationally demanding first training stage by gradually merging similar visual landmarks during training, while avoiding the model inconvenience caused by single-stage training of BLIP-2 type models. collapse. The gradual merging process effectively compresses visual information while retaining semantic richness, achieving fast convergence without affecting performance.

Experimental results show that our method speeds up the training of visual language models by 5 times without significant impact on overall performance. Furthermore, our model significantly closes the performance gap with current visual language models using only 1/10th the data. Finally, we show how our image-text model can be seamlessly adapted to video-conditioned language generation tasks via a novel soft attentional temporal, labeled context module.

StreamVoice: Streamable context-aware language modeling for real-time zero-shot speech conversion

Paper address: https://arxiv.org/pdf/2401.11053

Streaming Streaming zero-shot voice conversion refers to the ability to convert input speech into the speech of any speaker in real time, and only requires one sentence of the speaker's voice as a reference, and does not require additional model updates. Existing zero-sample speech conversion methods are usually designed for offline systems and are difficult to meet the streaming capability requirements of real-time speech conversion applications. Recent methods based on language model (LM) have shown excellent performance in zero-shot speech generation (including conversion), but they require whole-sentence processing and are limited to offline scenarios.
^{The overall architecture for StreamVoice}
In this work, we propose StreamVoice, a new zero-shot speech conversion model based on streaming LM, to achieve real-time conversion for arbitrary speakers and input speech. Specifically, to achieve streaming capabilities, StreamVoice uses a context-aware fully causal LM as well as a timing-independent acoustic predictor, while alternating semantic and acoustic features in an autoregressive process eliminates the dependence on the complete source speech.

In order to solve the performance degradation caused by incomplete context in streaming scenarios, two strategies are used to enhance LM’s context awareness of the future and history: 1) teacher-guided context foresight, through teacher-guided context foresight The model summarizes the current and future accurate semantics to guide the model in predicting the missing context; 2) The semantic masking strategy encourages the model to achieve acoustic prediction from previously damaged semantic input and enhance the learning ability of historical context. Experiments show that StreamVoice has streaming conversion capabilities while achieving zero-shot performance close to non-streaming VC systems.

G-DIG: Committed to gradient-based machine translation diversity and high-quality instruction data selection
Paper address: https://arxiv.org/pdf/2405.12915

Large Language models (LLMs) have demonstrated extraordinary capabilities in general scenarios. Fine-tuning of instructions allows them to perform on par with humans in a variety of tasks. However, the diversity and quality of instruction data remain two major challenges for instruction fine-tuning. To this end, we propose a novel gradient-based approach to automatically select high-quality and diverse instruction fine-tuning data for machine translation. Our key innovation lies in analyzing how individual training examples affect the model during training.

^{Overview of G-DIG}

Specifically, we select training examples that have a beneficial impact on the model as high-quality examples with the help of the influence function and a small high-quality seed data set. Furthermore, to enhance the diversity of training data, we maximize the diversity of their influence on the model by clustering and resampling their gradients. Extensive experiments on WMT22 and FLORES translation tasks demonstrate the superiority of our method, and in-depth analysis further validates its effectiveness and generality.

GroundingGPT: Language-enhanced Multi-modal Grounding model
Paper address: https://arxiv.org/pdf/2401.06071

Multimodal big language The model demonstrates excellent performance in various tasks across different modalities. However, previous models mainly emphasize capturing global information of multi-modal inputs. Therefore, these models lack the ability to effectively understand the details in the input data and perform poorly in tasks that require detailed understanding of the input. At the same time, most of these models suffer from serious hallucination problems. , limiting its widespread use.

In order to solve this problem and enhance the versatility of large multi-modal models in a wider range of tasks, we propose GroundingGPT, a multi-modal model that can achieve different granular understandings of images, videos, and audios. In addition to capturing global information, our proposed model is also good at handling tasks that require finer understanding, such as the model's ability to pinpoint specific regions in an image or specific moments in a video. In order to achieve this goal, we designed a diverse data set construction process to construct a multi-modal and multi-granular training data set. Experiments on multiple public benchmarks demonstrate the versatility and effectiveness of our model.

ReFT: Inference based on reinforcement fine-tuning
Paper address: https://arxiv.org/pdf/2401.08967

A common type of reinforcement large language model (LLMs) inference The capable approach is supervised fine-tuning (SFT) using Chain of Thought (CoT) annotated data. However, this method does not show strong enough generalization ability because the training only relies on the given CoT data. Specifically, in data sets related to mathematical problems, there is usually only one annotated reasoning path for each problem in the training data. For the algorithm, if it can learn multiple labeled reasoning paths for a problem, it will have stronger generalization capabilities.

Comparison between SFT and ReFT on the presence of CoT alternatives

To solve this challenge, taking mathematical problems as an example, we propose a simple and effective method called Reinforced Fine-Tuning (ReFT) to enhance the generalization ability of LLMs during inference. ReFT first uses SFT to warm up the model, and then uses online reinforcement learning (specifically the PPO algorithm in this work) for optimization, which automatically samples a large number of reasoning paths for a given problem and obtains rewards based on the real answers for further fine-tuning. Model.

Extensive experiments on GSM8K, MathQA and SVAMP datasets show that ReFT significantly outperforms SFT, and model performance can be further improved by combining strategies such as majority voting and reordering. It is worth noting that here ReFT only relies on the same training problem as SFT and does not rely on additional or enhanced training problems. This shows that ReFT has superior generalization ability.
Looking forward to your interactive questions

Live broadcast time: August 20, 2024 (Tuesday) 19:00-21:00
Live broadcast platform: WeChat video account [Doubao Big Model Team], Xiao Red Book Number [Doubao Researcher]

You are welcome to fill in the questionnaire and tell us about the questions you are interested in about the ACL 2024 paper, and chat with multiple researchers online!
The Beanbao model team continues to be hotly recruited. Please click this link to learn about team recruitment related information.

The above is the detailed content of 1 Outstanding, 5 Oral! Is ByteDance's ACL so fierce this year? Come and chat in the live broadcast room!. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1668

CakePHP Tutorial

1426

Laravel Tutorial

1329

PHP Tutorial

1273

C# Tutorial

1256

Related knowledge

DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

See all articles