Home Technology peripherals AI IBM develops cloud-native AI supercomputer Vela to flexibly deploy and train tens of billions of parameter models

IBM develops cloud-native AI supercomputer Vela to flexibly deploy and train tens of billions of parameter models

Apr 14, 2023 pm 01:46 PM
ai ibm cloud native

ChatGPT is popular on the Internet, and the AI ​​model training behind it has also attracted widespread attention. IBM Research recently announced that the cloud-native supercomputer Vela it developed can be quickly deployed and used to train basic AI models. Since May 2022, dozens of the company’s researchers have been using this supercomputer to train AI models with tens of billions of parameters.

IBM develops cloud-native AI supercomputer Vela to flexibly deploy and train tens of billions of parameter models

Basic models are AI models trained on large amounts of unlabeled data, and their versatility means they can be used for a range of different tasks with just fine-tuning. Their scale is enormous and requires massive and costly computing power. Therefore, as experts say, computing power will become the biggest bottleneck in developing the next generation of large-scale basic models, and training them requires a lot of computing power and time.

Training a model that can run tens of billions or hundreds of billions of parameters requires the use of high-performance computing hardware, including networks, parallel file systems, and bare metal nodes. This hardware is difficult to deploy and expensive to run. Microsoft built an AI supercomputer for OpenAI in May 2020 and hosted it in the Azure cloud platform. But IBM says they are hardware-driven, which increases cost and limits flexibility.

Cloud AI Supercomputer

So IBM created a system called Vela that is “specifically focused on large-scale AI.”

Vela can be deployed to any of IBM's cloud data centers as needed, and it is itself a "virtual cloud". While this approach reduces computing power compared to building physics-based supercomputers, it creates a more flexible solution. Cloud computing solutions provide engineers with resources through API interfaces, easier access to the broad IBM cloud ecosystem for deeper integration, and the ability to scale performance as needed.

IBM engineers explained that Vela is able to access data sets on IBM Cloud Object Storage instead of building a custom storage backend. Previously this infrastructure had to be built separately into supercomputers.

The key component of any AI supercomputer is a large number of GPUs and the nodes connecting them. Vela actually configures each node as a virtual machine (rather than bare metal). This is the most common method and is widely considered to be the most ideal method for AI training.

How is Vela built?

One of the disadvantages of cloud virtual computers is that performance cannot be guaranteed. To address performance degradation and deliver bare-metal performance inside virtual machines, IBM engineers found a way to unlock full node performance (including GPU, CPU, network and storage) and reduce load losses to less than 5%.

This involves configuring a bare metal host for virtualization, supporting virtual machine scaling, large page and single root IO virtualization, and realistic representation of all devices and connections within the virtual machine; also includes network cards and CPUs and GPUs matches, and how they bridge each other. After completing this work, they found that the performance of the virtual machine nodes was "close to bare metal."

In addition, they are also committed to designing AI nodes with large GPU memory and large amounts of local storage for caching AI training data, models and finished products. In tests using PyTorch, they found that by optimizing workload communication patterns, they were also able to bridge the bottleneck of relatively slow Ethernet networks compared to faster networks like Infiniband used in supercomputing.

In terms of configuration, each Vela uses eight 80GB A100 GPUs, two second-generation Intel Xeon scalable processors, 1.5TB of memory and four 3.2TB NVMe hard drives, and can be used at any scale Deploy to any IBM cloud data center around the world.

IBM engineers said: "Having the right tools and infrastructure is a key factor in improving R&D efficiency. Many teams choose to follow the tried-and-true path of building traditional supercomputers for AI... We have been working on a better solutions to provide the dual benefits of high-performance computing and high-end user productivity.”

The above is the detailed content of IBM develops cloud-native AI supercomputer Vela to flexibly deploy and train tens of billions of parameter models. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1657
14
PHP Tutorial
1257
29
C# Tutorial
1230
24
How much is Bitcoin worth How much is Bitcoin worth Apr 28, 2025 pm 07:42 PM

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Apr 28, 2025 pm 08:12 PM

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Apr 28, 2025 pm 08:09 PM

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings What are the top ten virtual currency trading apps? The latest digital currency exchange rankings Apr 28, 2025 pm 08:03 PM

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Apr 28, 2025 pm 03:33 PM

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

What are the top currency trading platforms? The top 10 latest virtual currency exchanges What are the top currency trading platforms? The top 10 latest virtual currency exchanges Apr 28, 2025 pm 08:06 PM

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.

Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Apr 28, 2025 pm 04:30 PM

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

How to use the chrono library in C? How to use the chrono library in C? Apr 28, 2025 pm 10:18 PM

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

See all articles