Home Technology peripherals AI The large-scale inference cost rankings led by Jia Yangqing's high efficiency are released

The large-scale inference cost rankings led by Jia Yangqing's high efficiency are released

Jan 26, 2024 pm 02:15 PM
ai train

"Is the API of large models a loss-making business?"

The large-scale inference cost rankings led by Jia Yangqings high efficiency are released

With the practicalization of large language model technology, many technologies The company has launched a large model API for developers to use. However, we can't help but start to wonder whether a business based on large models can be sustained, especially considering that OpenAI is burning through $700,000 a day.

This Thursday, AI startup Martian calculated it carefully for us.

The large-scale inference cost rankings led by Jia Yangqings high efficiency are released

Leaderboard link: https://leaderboard.withmartian.com/

The LLM Inference Provider Leaderboard is an open-source ranking of API inference products for large models. It benchmarks the cost, rate limits, throughput, and P50 and P90 TTFT for the Mixtral-8x7B and Llama-2-70B-Chat public endpoints of each vendor.

Although they compete with each other, Martian found that there are significant differences in the cost, throughput and rate limits of each company's large model services. These differences exceed the 5x cost difference, 6x throughput difference, and even larger rate limit differences. Choosing different APIs is critical to getting the best performance, even though it's just part of doing business.

According to the current ranking, the service provided by Anyscale has the best throughput under the medium service load of Llama-2-70B. For large service loads, Together AI performed best with P50 and P90 throughput on Llama-2-70B and Mixtral-8x7B.

Additionally, Jia Yangqing’s LeptonAI showed the best throughput when handling small task loads with short input and long output cues. Its P50 throughput of 130 tks/s is the fastest among the models currently provided by all manufacturers on the market.

Well-known AI scholar and Lepton AI founder Jia Yangqing commented immediately after the rankings were released. Let’s see what he said.

The large-scale inference cost rankings led by Jia Yangqings high efficiency are released

Jia Yangqing first explained the current status of the industry in the field of artificial intelligence, then affirmed the significance of benchmark testing, and finally pointed out that LeptonAI will help users find the best AI Basic strategy.

1. Big model API is "burning money"

If the model is in high workload benchmark test Leading position, then congratulations, it is "burning money."

LLM Reasoning about the capacity of a public API is like running a restaurant: you have a chef and you need to estimate customer traffic. Hiring a chef costs money. Latency and throughput can be understood as "how fast you can cook for customers." For a reasonable business, you need a "reasonable" number of chefs. In other words, you want to have capacity that can handle normal traffic, not sudden bursts of traffic that occur in a matter of seconds. A surge in traffic means waiting; otherwise, the "cook" will have nothing to do.

In the world of artificial intelligence, GPU plays the role of "chef". Baseline loads are bursty. Under low workloads, the baseline load is blended into normal traffic, and the measurements provide an accurate representation of how the service performs under current workloads.

The high service load scenario is interesting because it will cause interruptions. The benchmark only runs a few times per day/week, so it's not the regular traffic one should expect. Imagine having 100 people flock to your local restaurant to check out how quickly the chef is cooking. The results would be great. To borrow the terminology of quantum physics, this is called the "observer effect." The stronger the interference (i.e. the larger the burst load), the lower the accuracy. In other words: if you put a sudden high load on a service and see that the service responds very quickly, you know that the service has quite a bit of idle capacity. As an investor, when you see this situation, you should ask: Is this way of burning money responsible?

2. The model will eventually achieve similar performance

The field of artificial intelligence is very fond of competitive competitions, which is indeed interesting. Everyone quickly converges on the same solution, and Nvidia always wins in the end because of the GPU. This is thanks to great open source projects, vLLM is a great example. This means that, as a provider, if your model performs much worse than others, you can easily catch up by looking at open source solutions and applying good engineering.

3. "As a customer, I don't care about the provider's cost"

For artificial intelligence application building For developers, we are lucky: there are always API providers willing to "burn money". The AI ​​industry is burning money to gain traffic, and the next step is to worry about profits.

Benchmarking is a tedious and error-prone task. For better or worse, it usually happens that winners praise you and losers blame you. Such was the case with the last round of convolutional neural network benchmarks. It’s not an easy task, but benchmarking will help us achieve the next 10x in AI infrastructure.

Based on the artificial intelligence framework and cloud infrastructure, LeptonAI will help users find the best AI basic strategy.

The above is the detailed content of The large-scale inference cost rankings led by Jia Yangqing's high efficiency are released. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

WorldCoin (WLD) price forecast 2025-2031: Will WLD reach USD 4 by 2031? WorldCoin (WLD) price forecast 2025-2031: Will WLD reach USD 4 by 2031? Apr 21, 2025 pm 02:42 PM

WorldCoin (WLD) stands out in the cryptocurrency market with its unique biometric verification and privacy protection mechanisms, attracting the attention of many investors. WLD has performed outstandingly among altcoins with its innovative technologies, especially in combination with OpenAI artificial intelligence technology. But how will the digital assets behave in the next few years? Let's predict the future price of WLD together. The 2025 WLD price forecast is expected to achieve significant growth in WLD in 2025. Market analysis shows that the average WLD price may reach $1.31, with a maximum of $1.36. However, in a bear market, the price may fall to around $0.55. This growth expectation is mainly due to WorldCoin2.

What is the analysis chart of Bitcoin finished product structure? How to draw? What is the analysis chart of Bitcoin finished product structure? How to draw? Apr 21, 2025 pm 07:42 PM

The steps to draw a Bitcoin structure analysis chart include: 1. Determine the purpose and audience of the drawing, 2. Select the right tool, 3. Design the framework and fill in the core components, 4. Refer to the existing template. Complete steps ensure that the chart is accurate and easy to understand.

What does cross-chain transaction mean? What are the cross-chain transactions? What does cross-chain transaction mean? What are the cross-chain transactions? Apr 21, 2025 pm 11:39 PM

Exchanges that support cross-chain transactions: 1. Binance, 2. Uniswap, 3. SushiSwap, 4. Curve Finance, 5. Thorchain, 6. 1inch Exchange, 7. DLN Trade, these platforms support multi-chain asset transactions through various technologies.

Aavenomics is a recommendation to modify the AAVE protocol token and introduce token repurchase, which has reached the quorum number of people. Aavenomics is a recommendation to modify the AAVE protocol token and introduce token repurchase, which has reached the quorum number of people. Apr 21, 2025 pm 06:24 PM

Aavenomics is a proposal to modify the AAVE protocol token and introduce token repos, which has implemented a quorum for AAVEDAO. Marc Zeller, founder of the AAVE Project Chain (ACI), announced this on X, noting that it marks a new era for the agreement. Marc Zeller, founder of the AAVE Chain Initiative (ACI), announced on X that the Aavenomics proposal includes modifying the AAVE protocol token and introducing token repos, has achieved a quorum for AAVEDAO. According to Zeller, this marks a new era for the agreement. AaveDao members voted overwhelmingly to support the proposal, which was 100 per week on Wednesday

The top ten free platform recommendations for real-time data on currency circle markets are released The top ten free platform recommendations for real-time data on currency circle markets are released Apr 22, 2025 am 08:12 AM

Cryptocurrency data platforms suitable for beginners include CoinMarketCap and non-small trumpet. 1. CoinMarketCap provides global real-time price, market value, and trading volume rankings for novice and basic analysis needs. 2. The non-small quotation provides a Chinese-friendly interface, suitable for Chinese users to quickly screen low-risk potential projects.

Rexas Finance (RXS) can surpass Solana (Sol), Cardano (ADA), XRP and Dogecoin (Doge) in 2025 Rexas Finance (RXS) can surpass Solana (Sol), Cardano (ADA), XRP and Dogecoin (Doge) in 2025 Apr 21, 2025 pm 02:30 PM

In the volatile cryptocurrency market, investors are looking for alternatives that go beyond popular currencies. Although well-known cryptocurrencies such as Solana (SOL), Cardano (ADA), XRP and Dogecoin (DOGE) also face challenges such as market sentiment, regulatory uncertainty and scalability. However, a new emerging project, RexasFinance (RXS), is emerging. It does not rely on celebrity effects or hype, but focuses on combining real-world assets (RWA) with blockchain technology to provide investors with an innovative way to invest. This strategy makes it hoped to be one of the most successful projects of 2025. RexasFi

Ranking of leveraged exchanges in the currency circle The latest recommendations of the top ten leveraged exchanges in the currency circle Ranking of leveraged exchanges in the currency circle The latest recommendations of the top ten leveraged exchanges in the currency circle Apr 21, 2025 pm 11:24 PM

The platforms that have outstanding performance in leveraged trading, security and user experience in 2025 are: 1. OKX, suitable for high-frequency traders, providing up to 100 times leverage; 2. Binance, suitable for multi-currency traders around the world, providing 125 times high leverage; 3. Gate.io, suitable for professional derivatives players, providing 100 times leverage; 4. Bitget, suitable for novices and social traders, providing up to 100 times leverage; 5. Kraken, suitable for steady investors, providing 5 times leverage; 6. Bybit, suitable for altcoin explorers, providing 20 times leverage; 7. KuCoin, suitable for low-cost traders, providing 10 times leverage; 8. Bitfinex, suitable for senior play

What are the hybrid blockchain trading platforms? What are the hybrid blockchain trading platforms? Apr 21, 2025 pm 11:36 PM

Suggestions for choosing a cryptocurrency exchange: 1. For liquidity requirements, priority is Binance, Gate.io or OKX, because of its order depth and strong volatility resistance. 2. Compliance and security, Coinbase, Kraken and Gemini have strict regulatory endorsement. 3. Innovative functions, KuCoin's soft staking and Bybit's derivative design are suitable for advanced users.

See all articles