Table of Contents
Accelerating the development of new hardware" >Accelerating the development of new hardware
Smaller, Faster, Cheaper" >Smaller, Faster, Cheaper
Acceleration equation" >Acceleration equation
Current Development of Hardware" >Current Development of Hardware
Summary" >Summary
Home Technology peripherals AI How artificial intelligence can make hardware develop better

How artificial intelligence can make hardware develop better

Apr 13, 2023 am 08:13 AM
AI ai

How artificial intelligence can make hardware develop better

Computer hardware has been an inactive market for many years. The dominant x86 microprocessor architecture has reached the limits of performance gains that can be achieved through miniaturization, so manufacturers are primarily focused on packing more cores into a chip.

For the rapid development of machine learning and deep learning, GPU is the savior. Originally designed for graphics processing, GPUs can have thousands of small cores, making them ideal for the parallel processing capabilities required for AI training.

The essence of artificial intelligence is that it benefits from parallel processing, and about 10 years ago it was discovered that GPUs, which are designed to display pixels on a screen, are well suited for this because they are parallel processing engines that can Put in a lot of cores.

That’s good news for Nvidia, which saw its market capitalization surge from less than $18 billion in 2015 to $735 billion before the market contracted last year. Until recently, the company had virtually the entire market to itself. But many competitors are trying to change that.

In terms of artificial intelligence workloads, it has been mainly Nvidia’s GPUs so far, but users are looking for technologies that can take it to the next level. As high-performance computing and AI workloads continue to converge, we We will see a wider variety of accelerators emerge.

Accelerating the development of new hardware

The big chip manufacturers are not standing still. Three years ago, Intel acquired Israeli chipmaker Havana Labs and made the company the focus of its artificial intelligence development efforts.

The Gaudi2 training optimization processor and Greco inference processor launched by Havana last spring are said to be at least twice as fast as Nvidia’s flagship processor A100.

In March this year, Nvidia launched its H100 accelerator GPU with 80 billion transistors and support for the company's high-speed NVLink interconnect. It features a dedicated engine that can accelerate the execution of Transformer-based models used in natural language processing by six times compared to the previous generation. Recent tests using the MLPerf benchmark show that H100 outperforms Gaudi2 in most deep learning tests. Nvidia is also seen as having an advantage in its software stack.

Many users choose GPUs because they have access to an ecosystem of centralized software. The reason why NVIDIA is so successful is because they have established an ecosystem strategy.

Hyperscale cloud computing companies are entering the field even before chipmakers. Google LLC’s Tensor processing unit is an application-specific integrated circuit that was launched in 2016 and is currently in its fourth generation. Amazon Web Services launched its inference processing accelerator for machine learning in 2018, claiming it offers more than twice the performance of GPU-accelerated instances.

Last month, the company announced the general availability of cloud instances based on its Trainium chips, saying that in deep learning model training scenarios, with comparable performance, their cost ratio based on GPU's EC2 is 50% lower. The efforts of both companies are mainly focused on delivery through cloud services.

While established market leaders focus on incremental improvements, many of the more interesting innovations are taking place among startups building AI-specific hardware. Venture capitalists attracted the majority of the $1.8 billion invested in chip startups last year, more than double the amount in 2017, according to the data.

They are chasing a market that could bring huge gains. The global artificial intelligence chip market is expected to grow from US$8 billion in 2020 to nearly US$195 billion by 2030.

Smaller, Faster, Cheaper

Few startups want to replace x86 CPUs, but that’s because of the leverage to do so Relatively small. Chips are no longer the bottleneck, communication between different chips is a huge bottleneck.

The CPU performs low-level operations such as managing files and assigning tasks, but a purely CPU-specific approach is no longer suitable for extensions. The CPU is designed for everything from opening files to managing memory caches. Activities must be universal. This means that it is not well suited for the massively parallel matrix arithmetic operations required for AI model training.

Most activity in the market revolves around coprocessor accelerators, application-specific integrated circuits, and, to a lesser extent, field-programmable gate arrays that can be fine-tuned for specific uses.

Everyone is following Google's line of developing co-processors that work in conjunction with the CPU to target algorithms by hard-coding them into the processor rather than running them as software. Specific parts of the AI ​​workload.

Acceleration equation

The acceleration equation is used to develop so-called graphics stream processors for edge computing scenarios such as self-driving cars and video surveillance. The fully programmable chipset takes on many of the functions of a CPU but is optimized for task-level parallelism and streaming execution processing, using only 7 watts of power.

The architecture is based on a graph data structure, where relationships between objects are represented as connected nodes and edges. Each machine learning framework uses graph concepts, maintaining the same semantics throughout the chip's design. The entire graph including the CMM but containing custom nodes can be executed. We can speed up anything parallel in these graphs.

Its graphics-based architecture solves some of the capacity limitations of GPUs and CPUs and can more flexibly adapt to different types of AI tasks. It also allows developers to move more processing to the edge for better inference. If companies can pre-process 80% of the processing, they can save a lot of time and costs.

These applications can bring intelligence closer to data and enable rapid decision-making. The goal of most is inference, which is the field deployment of AI models, rather than the more computationally intensive training tasks.

A company is developing a chip that uses in-memory computing to reduce latency and the need for external storage devices. Its artificial intelligence platform will provide flexibility and the ability to run multiple neural networks while maintaining high accuracy.

Its data processing unit series is a massive parallel processor array with a scalable 80-core processor that can execute dozens of tasks in parallel. The key innovation is the tight integration of a tensor coprocessor inside each processing element and support for direct tensor data exchange between elements to avoid memory bandwidth bottlenecks. This enables efficient AI application acceleration because pre- and post-processing are performed on the same processing elements.

Some companies focus on inferring deep learning models using thumbnail-sized chipsets, which the company claims can perform 26 trillion operations per second while consuming less power. to 3 watts. In part, it is achieved by breaking down each network layer used to train a deep learning model into the required computing elements and integrating them on a chip specifically built for deep learning.

The use of onboard memory further reduces overhead. The entire network is inside the chip and there is no external memory, which means the chip can be smaller and consume less energy. The chip can run deep learning models on near-real-time high-definition images, enabling a single device to run automatic license plate recognition on four lanes simultaneously.

Current Development of Hardware

Some startups are taking more of a moonshot approach, aiming to redefine AI model training and the entire platform it runs on.

For example, an AI processor optimized for machine learning can manage up to 3.5 million per second with nearly 9,000 concurrent threads and 900 megabytes of in-processor memory. billion processing operations. The integrated computing system is called the Bow-2000IPU machine and is said to be capable of 1.4 petaflops of operations per second.

What makes it different is its three-dimensional stacked chip design, which enables it to package nearly 1,500 parallel processing cores in a single chip. All of these businesses are capable of running completely different businesses. This differs from widely used GPU architectures, which prefer to run the same operations on large blocks of data.

As another example, some companies are solving the problem of interconnection, that is, the wiring between connecting components in integrated circuits. As processors reach their theoretical maximum speeds, the path to move the bits becomes increasingly a bottleneck, especially when multiple processors access memory simultaneously. Today's chips are no longer the bottleneck of the interconnect.

The chip uses nanophotonic waveguides in an artificial intelligence platform that it says combines high speed and large bandwidth in a low-energy package. It is essentially an optical communications layer that can connect multiple other processors and accelerators.

The quality of AI results comes from the ability to simultaneously support very large and complex models while achieving very high throughput responses, both of which are achievable. This applies to anything that can be done using linear algebra, including most applications of artificial intelligence.

Expectations for its integrated hardware and software platform are extremely high. Enterprises have seized on this point, such as R&D platforms that can run artificial intelligence and other data-intensive applications anywhere from the data center to the edge.

The hardware platform uses custom 7nm chips designed for machine and deep learning. Its reconfigurable dataflow architecture runs an AI-optimized software stack, and its hardware architecture is designed to minimize memory accesses, thereby reducing interconnect bottlenecks.

The processor can be reconfigured to adapt to AI or high-performance computing HPC workloads. The processor is designed to handle large-scale matrix operations at a higher performance level, which is ideal for A plus for clients with changing workloads.

Although CPUs, GPUs and even FPGAs are well suited for deterministic software such as transactional systems and ERP, machine learning algorithms are probabilistic, meaning the results are not known in advance. This requires a completely different hardware infrastructure.

The platform minimizes interconnect issues by connecting 1TB of high-speed double data rate synchronous memory to the processor, essentially masking it with 20x faster on-chip memory The latency of the DDR controller, so this is transparent to the user, allows us to train higher parameter count language models and the highest resolution images without tiling or downsampling.

Tiling is a technique used for image analysis that reduces the need for computing power by splitting an image into smaller chunks, analyzing each chunk, and then recombining them. need. Downsampling trains a model on a random subset of the training data to save time and computing resources. The result is a system that is not only faster than GPU-based systems, but also capable of solving larger problems.

Summary

With many businesses seeking solutions to the same problems, a shakeout is inevitable, but no one Expect this shakeout to come soon. GPUs will be around for a long time and will probably remain the most cost-effective solution for AI training and inference projects that don’t require extreme performance.

Nevertheless, as models at the high end of the market become larger and more complex, there is an increasing need for functionally specific architectures. Three to five years from now, we will likely see a proliferation of GPUs and AI accelerators, which is the only way we can scale to meet demand at the end of this decade and beyond.

Leading chipmakers are expected to continue doing what they do well and gradually build on existing technologies. Many companies will also follow Intel's lead and acquire startups focused on artificial intelligence. The high-performance computing community is also focusing on the potential of artificial intelligence to help solve classic problems such as large-scale simulations and climate modeling.

The high-performance computing ecosystem is always looking for new technologies they can absorb to stay ahead of the curve, and they are exploring what artificial intelligence can bring to the table. Lurking behind the scenes is quantum computing, a technology that remains more theoretical than practical but has the potential to revolutionize computing.

Regardless of which new architecture gains traction, the surge in artificial intelligence has undoubtedly reignited interest in the potential for hardware innovation to open up new frontiers in software.

The above is the detailed content of How artificial intelligence can make hardware develop better. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Apr 28, 2025 pm 08:09 PM

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Apr 28, 2025 pm 03:33 PM

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

How to use the chrono library in C? How to use the chrono library in C? Apr 28, 2025 pm 10:18 PM

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

Bitcoin price today Bitcoin price today Apr 28, 2025 pm 07:39 PM

Bitcoin’s price fluctuations today are affected by many factors such as macroeconomics, policies, and market sentiment. Investors need to pay attention to technical and fundamental analysis to make informed decisions.

Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Apr 28, 2025 pm 04:30 PM

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

How much is Bitcoin worth How much is Bitcoin worth Apr 28, 2025 pm 07:42 PM

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings What are the top ten virtual currency trading apps? The latest digital currency exchange rankings Apr 28, 2025 pm 08:03 PM

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Apr 28, 2025 pm 08:12 PM

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

See all articles