


Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns
From ChatGPT to AI drawing technology, this recent wave of breakthroughs in the field of artificial intelligence may be thanks to Transformer.
Today is the sixth anniversary of the submission of the famous transformer paper.
## Paper link: https://arxiv.org/abs/1706.03762
Six years ago, a paper with a somewhat exaggerated name was uploaded to the preprint paper platform arXiv. The phrase "xx is All You Need" was repeatedly repeated by developers in the AI field. , has even become a trend in paper titles, and Transformer no longer means Transformers, it now represents the most advanced technology in the field of AI.
Six years later, looking back at this paper, we can find many interesting or little-known aspects, as NVIDIA AI scientist Jim Fan summarized.
The Transformer model abandons the tradition Of CNN and RNN units, the entire network structure is entirely composed of attention mechanisms.
Although the name of the Transformer paper is "Attention is All You Need", and we continue to praise the attention mechanism because of it, please note an interesting fact: it is not Transformer's research Researchers invented attention, but they pushed this mechanism to the extreme.
Attention Mechanism was proposed in 2014 by a team led by deep learning pioneer Yoshua Bengio:
"Neural Machine Translation by Jointly Learning to Align and Translate", the title is relatively simple.
In this ICLR 2015 paper, Bengio et al. proposed a combination of RNN "context vectors" (i.e., attention). Although it is one of the greatest milestones in the field of NLP, it is much less well-known than transformer. The Bengio team's paper has been cited 29,000 times so far, and Transformer has 77,000 times.
AI’s attention mechanism is naturally modeled after human visual attention. The human brain has an innate ability: when we look at a picture, we first quickly scan the picture and then focus on the target area that needs to be focused on.
If you don’t let go of any local information, you will inevitably do a lot of useless work, which is not conducive to survival. Likewise, introducing similar mechanisms into deep learning networks can simplify models and speed up calculations. In essence, Attention is to filter out a small amount of important information from a large amount of information, and focus on this important information, while ignoring most of the unimportant information.
In recent years, attention mechanisms have been widely used in various fields of deep learning, such as for capturing receptive fields on images in the direction of computer vision, or for locating key tokens in NLP. or characteristics. A large number of experiments have proven that models with attention mechanisms have achieved significant performance improvements in tasks such as image classification, segmentation, tracking, and enhancement, as well as natural language recognition, understanding, question answering, and translation.
The Transformer model that introduces the attention mechanism can be regarded as a general-purpose sequence computer. The attention mechanism allows the model to assign different assignments based on the correlation of different positions in the sequence when processing the input sequence. The attention weight allows the Transformer to capture long-distance dependencies and contextual information, thereby improving the effect of sequence processing.
But at that time, neither Transformer nor the original attention paper talked about universal sequence computers. Instead, the authors see it as a mechanism for solving a narrow and specific problem—machine translation. So in the future, when we trace the origin of AGI, we may be able to trace it back to the "humble" Google Translate.
Although it was accepted by NeurIPS 2017, it didn’t even get an Oral
Transformer Although this paper is very influential now, it was not the top AI in the world that year At the conference NeurIPS 2017, I didn’t even get an Oral, let alone an award. The conference received a total of 3240 paper submissions that year, of which 678 were selected as conference papers. The Transformer paper was one of the accepted papers. Among these papers, 40 were Oral papers, 112 were Spotlight papers, and 3 were the best. Thesis, a Test of time award, Transformer is not eligible for the award.
Although it missed the NeurIPS 2017 paper award, the influence of Transformer is obvious to all.
Jim Fan commented: It is not the fault of the judges that it is difficult for people to realize the importance of an influential study before it becomes influential. However, there are also papers that are lucky enough to be discovered immediately. For example, ResNet proposed by He Yuming and others won the best paper of CVPR 2016. This research is well-deserved and has been correctly recognized by the top AI conference. But at the moment in 2017, very smart researchers may not be able to predict the changes brought about by LLM now. Just like in the 1980s, few people could foresee the tsunami brought about by deep learning since 2012.
Eight authors, each with a wonderful life
There were 8 authors of this paper at that time. They were from Google and the University of Toronto. Five years later, most of them The authors of the paper have all left their original institutions.
On April 26, 2022, a company called "Adept" was officially established. There are 9 co-founders, including Ashish Vaswani, two of the authors of the Transformer paper. and Niki Parmar.
##Ashish Vaswani Obtained a PhD from the University of Southern California, where he studied under the tutelage of Chinese scholars David Chiang and Liang Huang. He mainly studied the early applications of modern deep learning in language modeling. In 2016, he joined Google Brain and led Transformer research before leaving Google in 2021.
Niki Parmar graduated from the University of Southern California with a master's degree and joined Google in 2016. While there, she developed some successful Q&A and text similarity models for Google search and ads. She led early work extending the Transformer model into areas such as image generation, computer vision, and more. In 2021, she also left Google.
After leaving, the two co-founded Adept and served as chief scientist (Ashish Vaswani) and chief technology officer (Niki Parmar) respectively. Adept’s vision is to create an AI called an “artificial intelligence teammate” that is trained to use a variety of different software tools and APIs.
In March 2023, Adept announced the completion of a US$350 million Series B financing. The company’s valuation exceeded US$1 billion, making it a unicorn. However, by the time Adept raised funds publicly, Niki Parmar and Ashish Vaswani had left Adept and founded their own new AI company. However, this new company is still confidential and we are unable to obtain detailed information about the company.
Another paper author Noam Shazeer is one of Google’s most important early employees. He joined Google at the end of 2000 until he finally left in 2021, and then became the CEO of a start-up called "Character.AI".
In addition to Noam Shazeer, the founder of Character.AI is Daniel De Freitas, both of whom are from Google’s LaMDA team. Previously, they built LaMDA, a language model that supports conversational programs at Google.
In March this year, Character.AI announced the completion of US$150 million in financing, with a valuation reaching US$1 billion. It is one of the few startups that has the potential to compete with OpenAI, the organization to which ChatGPT belongs. , is also a rare company that grew into a unicorn in only 16 months. Its application, Character.AI, is a neural language model chatbot that can generate human-like text responses and engage in contextual conversations.
Character.AI was released on the Apple App Store and Google Play Store on May 23, 2023, and was downloaded more than 1.7 million times in its first week. In May 2023, the service added a $9.99 per month paid subscription called c.ai, which allows users priority chat access, faster response times and early access to new features, among other perks.
Aidan N. Gomez Left as early as 2019 Google, then worked as a researcher at FOR.ai, and is now co-founder and CEO of Cohere.
Cohere is a generative AI startup founded in 2019. Its core business includes providing NLP models and helping enterprises improve human-computer interaction. The three founders are Ivan Zhang, Nick Frosst and Aidan Gomez, among whom Gomez and Frosst are former members of the Google Brain team. In November 2021, Google Cloud announced that they would be partnering with Cohere, with Google Cloud using its robust infrastructure to power the Cohere platform, and Cohere using Cloud's TPUs to develop and deploy its products.
It is worth noting that Cohere has just received US$270 million in Series C financing, becoming a unicorn with a market capitalization of US$2.2 billion.
Łukasz KaiserLeaving Google in 2021 to work at Google I have been working for 7 years and 9 months, and now I am a researcher at OpenAI. While working as a research scientist at Google, he participated in the design of SOTA neural models for machine translation, parsing, and other algorithmic and generation tasks. He was a co-author of the TensorFlow system and the Tensor2Tensor library.
Jakob Uszkoreit left Google in 2021 to work at Google After 13 years, he joined Inceptive and became a co-founder. Inceptive is an AI pharmaceutical company dedicated to using deep learning to design RNA drugs.
While working at Google, Jakob Uszkoreit participated in forming the language understanding team of Google Assistant, and also worked on Google Translate in the early days.
Illia Polosukhin left Google in 2017 and is now NEAR Co-founder and CTO of .AI (a blockchain underlying technology company).
The only one who is still at Google is Llion Jones, this year This is his 9th year at Google.
Now, 6 years have passed since the publication of the paper "Attention Is All You Need", and the original authors have Choose to leave, some choose to stay at Google, no matter what, Transformer's influence continues.
The above is the detailed content of Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

Bitcoin’s price fluctuations today are affected by many factors such as macroeconomics, policies, and market sentiment. Investors need to pay attention to technical and fundamental analysis to make informed decisions.

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.
