Home Technology peripherals AI LeCun's evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

LeCun's evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

Jan 18, 2024 pm 02:15 PM
ai data

How to choose a visual model based on specific needs?

How do ConvNet/ViT and supervised/CLIP models compare with each other on indicators other than ImageNet?

The latest research published by researchers from MABZUAI and Meta comprehensively compares common visual models on "non-standard" indicators.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

Paper address: https://arxiv.org/pdf/2311.09215.pdf

LeCun The study was highly praised and called excellent. The study compares similarly sized ConvNext and VIT architectures, providing a comprehensive comparison of various properties when trained in supervised mode and using CLIP methods.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

Beyond ImageNet accuracy

The computer vision model landscape is becoming increasingly diverse and complex.

From early ConvNets to the evolution of Vision Transformers, the types of available models are constantly expanding.

Similarly, the training paradigm has evolved from supervised training on ImageNet to self-supervised learning and image-text pair training like CLIP.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

While marking progress, this explosion of options poses a major challenge for practitioners: How to choose the goals that are right for them Model?

ImageNet accuracy has always been the main indicator for evaluating model performance. Since sparking the deep learning revolution, it has driven significant advances in the field of artificial intelligence.

However, it cannot measure the nuances of models resulting from different architectures, training paradigms, and data.

If judged solely by ImageNet accuracy, models with different properties may look similar (Figure 1). This limitation becomes more apparent as the model begins to overfit the features of ImageNet and accuracy reaches saturation.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

#To bridge the gap, the researchers conducted an in-depth exploration of model behavior beyond ImageNet accuracy.

In order to study the impact of architecture and training objectives on model performance, Vision Transformer (ViT) and ConvNeXt were specifically compared. The ImageNet-1K validation accuracy and computational requirements of these two modern architectures are comparable.

In addition, the study compared supervised models represented by DeiT3-Base/16 and ConvNeXt-Base, as well as OpenCLIP's visual encoder based on the CLIP model.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

ANALYSIS OF RESULTS

The researchers' analysis is intended to require no further training or fine-tuning of the study. Evaluated model behavior.

This approach is particularly important for practitioners with limited computing resources, as they often rely on pre-trained models.

In the specific analysis, although the author recognizes the value of downstream tasks such as object detection, the focus is on those features that can provide insights with minimal computational requirements and reflect the application of real-world applications. Very important behavioral properties.

Model Error

ImageNet-X is a dataset that extends ImageNet-1K and contains Detailed manual annotation of 16 changing factors, enabling in-depth analysis of model errors in image classification.

It uses error rates (lower is better) to quantify the model's performance on specific factors relative to overall accuracy, allowing for a nuanced analysis of model errors. Results on ImageNet-X show:

1. Relative to its ImageNet accuracy, the CLIP model makes fewer errors than the supervised model.

2. All models are mainly affected by complex factors such as occlusion.

3. Texture is the most challenging element of all models.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

Shape/Texture Deviation

Shape/Texture Deviation checks whether the model relies on texture shortcuts rather than advanced shape hints.

This bias can be studied by combining cue-conflicting images of different categories of shape and texture.

This approach helps to understand to what extent the model's decisions are based on shape compared to texture.

The researchers evaluated the shape-texture bias on the cue conflict dataset and found that the texture bias of the CLIP model was smaller than that of the supervised model, while the shape bias of the ViT model was higher than that of ConvNets.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

Model Calibration

Calibrate the prediction confidence of the quantifiable model and its Is the actual accuracy consistent?

This can be assessed through metrics such as expected calibration error (ECE), as well as visualization tools such as reliability plots and confidence histograms.

The researchers evaluated the calibration on ImageNet-1K and ImageNet-R, classifying predictions into 15 levels. In the experiment, the following points were observed:

- The CLIP model has high confidence, while the supervised model is slightly less confident.

- Supervised ConvNeXt is better calibrated than supervised ViT.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

Robustness and portability

The robustness and portability of the model Portability is key to adapting to changes in data distribution and new tasks.

The researchers evaluated the robustness using different ImageNet variants and found that while the ViT and ConvNeXt models had similar average performance, except for ImageNet-R and ImageNet-Sketch, supervision Models generally outperform CLIP in terms of robustness.

In terms of portability, evaluated on 19 datasets using the VTAB benchmark, supervised ConvNeXt outperforms ViT and is almost on par with the performance of the CLIP model.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

Synthetic data

Synthetic datasets like PUG-ImageNet , which can precisely control factors such as camera angle and texture, has become a promising research avenue, so researchers analyzed the model's performance based on synthetic data.

PUG-ImageNet contains photorealistic ImageNet images with systematic variations in lighting and other factors, with performance measured as the absolute highest accuracy.

The researchers provided results for different factors in PUG-ImageNet and found that ConvNeXt outperformed ViT in almost all factors.

This shows that ConvNeXt outperforms ViT on synthetic data, while the gap is smaller for the CLIP model, because the accuracy of the CLIP model is lower than the supervised model, which may be different from the accuracy of the original ImageNet lower related.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

Feature invariance

Feature invariance refers to the ability of the model to produce Consistent representation that is not affected by input transformations, thus preserving semantics such as scaling or moving.

This feature enables the model to generalize well across different but semantically similar inputs.

The researchers’ approach includes resizing images to achieve scale invariance, moving crops to achieve position invariance, and adjusting the resolution of the ViT model using interpolated positional embeddings.

In supervised training, ConvNeXt outperforms ViT.

Overall, the model is more robust to scale/resolution transformations than to movements. For applications that require high robustness to scaling, displacement, and resolution, the results suggest that supervised ConvNeXt may be the best choice.

LeCuns evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?

#The researchers found that each model had its own unique advantages.

This suggests that the choice of model should depend on the target use case, as standard performance metrics may overlook mission-critical nuances.

Furthermore, many existing benchmarks are derived from ImageNet, which biases the evaluation. Developing new benchmarks with different data distributions is crucial to evaluate models in a more realistically representative context.

ConvNet vs Transformer

- In many benchmarks, supervised ConvNeXt has better performance than supervised VIT Better performance: It is better calibrated, invariant to data transformations, exhibits better transferability and robustness.

- ConvNeXt outperforms ViT on synthetic data.

- ViT has a higher shape bias.

Supervised vs CLIP

- Although the CLIP model is better in terms of transferability, supervised ConvNeXt Demonstrated competence in this task. This demonstrates the potential of supervised models.

- Supervised models are better at robustness benchmarks, probably because these models are variants of ImageNet.

- The CLIP model has higher shape bias and fewer classification errors compared to its ImageNet accuracy.

The above is the detailed content of LeCun's evaluation: Meta evaluation of ConvNet and Transformer, which one is stronger?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1662
14
PHP Tutorial
1262
29
C# Tutorial
1234
24
How much is Bitcoin worth How much is Bitcoin worth Apr 28, 2025 pm 07:42 PM

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Apr 28, 2025 pm 08:12 PM

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

What are the top currency trading platforms? The top 10 latest virtual currency exchanges What are the top currency trading platforms? The top 10 latest virtual currency exchanges Apr 28, 2025 pm 08:06 PM

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Apr 28, 2025 pm 08:09 PM

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Apr 28, 2025 pm 03:33 PM

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings What are the top ten virtual currency trading apps? The latest digital currency exchange rankings Apr 28, 2025 pm 08:03 PM

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Apr 28, 2025 pm 04:30 PM

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

Bitcoin price today Bitcoin price today Apr 28, 2025 pm 07:39 PM

Bitcoin’s price fluctuations today are affected by many factors such as macroeconomics, policies, and market sentiment. Investors need to pay attention to technical and fundamental analysis to make informed decisions.

See all articles