ImageNet label error removed, model ranking changed significantly-AI-php.cn

Table of Contents

Results

Home

Technology peripherals

ImageNet label error removed, model ranking changed significantly

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 12, 2023 pm 05:46 PM

ai Model

Previously, ImageNet became a hot topic because of the problem of label errors. You may be surprised to hear this number. There are at least 100,000 labels with problems. Studies based on incorrect labels may have to be overturned and repeated.

From this point of view, managing the quality of data sets is still very important.

Many people will use the ImageNet data set as a benchmark, but based on the ImageNet pre-trained model, the final results may vary due to data quality.

In this article, Kenichi Higuchi, an engineer from Adansons Company, re-studies the ImageNet data set in the article "Are we done with ImageNet?", and after removing the wrong label data, re-evaluates it and publishes it on torchvision model.

Remove erroneous data from ImageNet and re-evaluate the model

This paper divides labeling errors in ImageNet into three categories, as follows.

(1) Data with incorrect labeling

(2) Data corresponding to multiple labels

(3) Data that does not belong to any label

ImageNet label error removed, model ranking changed significantly

In summary, there are approximately more than 14,000 erroneous data. Considering that the number of evaluation data is 50,000, it can be seen that the proportion of erroneous data is extremely high. The figure below shows some representative error data.

ImageNet label error removed, model ranking changed significantly

Method

Without retraining the model, this study only excludes incorrectly labeled data, That is, the above-mentioned type (1) erroneous data, and excluding all erroneous data from the evaluation data, that is, (1)-(3) erroneous data, to recheck the accuracy of the model.

In order to delete error data, a metadata file describing the label error information is required. In this metadata file, if it contains errors of type (1)-(3), the information will be described in the "correction" attribute.

ImageNet label error removed, model ranking changed significantly

The study used a tool called Adansons Base, which filters data by linking datasets to metadata. 10 models were tested here as shown below.

ImageNet label error removed, model ranking changed significantly

10 image classification models used for testing

Results

The results are shown in the table below (numeric values is the accuracy in %, the number in brackets is the ranking)

ImageNet label error removed, model ranking changed significantly

The results of 10 classification models

With All Eval data is the baseline. Excluding incorrect data types (1), the accuracy increases by an average of 3.122 points. Excluding all incorrect data (1) to (3), the accuracy increases by an average of 11.743 points.

As expected, excluding erroneous data, the accuracy rate is improved across the board. There is no doubt that compared with clean data, erroneous data is prone to errors.

The accuracy ranking of the model changed when the evaluation was performed without excluding erroneous data, and when erroneous data (1)~(3) were all excluded.

In this article, there are 3,670 erroneous data (1), accounting for 7.34% of the total 50,000 pieces of data. After removal, the accuracy rate increased by about 3.22 points on average. When erroneous data is removed, the data scale changes, and a simple comparison of accuracy rates may be biased.

Conclusion

Although not specifically emphasized, it is important to use accurately labeled data when doing evaluation training.

Previous studies may have drawn incorrect conclusions when comparing accuracy between models. So the data should be evaluated first, but can this really be used to evaluate the performance of the model?

Many models using deep learning often disdain to reflect on the data, but are eager to improve accuracy and other evaluation metrics through the performance of the model, even if the evaluation data contains erroneous data. Not processed accurately.

When creating your own data sets, such as when applying AI in business, creating high-quality data sets is directly related to improving the accuracy and reliability of AI. The experimental results of this paper show that simply improving data quality can improve accuracy by about 10 percentage points, which demonstrates the importance of improving not only the model but also the data set when developing AI systems.

However, ensuring the quality of the data set is not easy. While increasing the amount of metadata is important to properly assess the quality of AI models and data, it can be cumbersome to manage, especially with unstructured data.

The above is the detailed content of ImageNet label error removed, model ranking changed significantly. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

4 weeks ago By DDD

How to fix KB5055523 fails to install in Windows 11?

3 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks ago By DDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Blue Prince: How To Get To The Basement

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7922

Java Tutorial

1652

CakePHP Tutorial

1411

Laravel Tutorial

1303

PHP Tutorial

1249

Related knowledge

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Apr 28, 2025 pm 03:33 PM

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Apr 28, 2025 pm 08:09 PM

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

How to use the chrono library in C? Apr 28, 2025 pm 10:18 PM

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Apr 28, 2025 pm 04:30 PM

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

What are the top currency trading platforms? The top 10 latest virtual currency exchanges Apr 28, 2025 pm 08:06 PM

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.

How to measure thread performance in C? Apr 28, 2025 pm 10:21 PM

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

How much is Bitcoin worth Apr 28, 2025 pm 07:42 PM

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings Apr 28, 2025 pm 08:03 PM

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

See all articles