Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?-AI-php.cn

Home

Technology peripherals

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 24, 2024 pm 08:38 PM

project Multimodal large model

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

The AIxiv column is a column where academic and technical content is published on this site. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

This work was initiated by the basic theory innovation team led by Professor Zhu Jun of Tsinghua University. For a long time, the team has focused on the current bottleneck issues in the development of artificial intelligence, explored original artificial intelligence theories and key technologies, and is at the international leading level in the research on adversarial security theories and methods of intelligent algorithms. It has also conducted in-depth research on the adversarial robustness and effectiveness of deep learning. Basic common issues such as data utilization efficiency. Relevant work won the first prize of Wu Wenjun Artificial Intelligence Natural Science Award, published more than 100 CCF Class A papers, developed the open source ARES counterattack attack and defense algorithm platform (https://github.com/thu-ml/ares), and realized some patented products Transform learning and research into practical application.

Multi-modal large language models (MLLMs) represented by GPT-4o have attracted much attention due to their excellent performance in multiple modalities such as language and images. They have not only become users' right-hand assistants in daily work, but have also gradually penetrated into major application fields such as autonomous driving and medical diagnosis, setting off a technological revolution.

However, are multi-modal large models safe and reliable?

^{As shown in Figure 1, by modifying the image pixels through adversarial attacks, GPT-4o will The tailed lion statue was mistakenly identified as the Eiffel Tower in Paris or Big Ben in London. The content of such error targets can be customized at will, even beyond the safe boundaries of the model application.}

In the jailbreak attack scenario, although Claude successfully rejected the malicious request in text form, when the user input an additional solid-color unrelated picture, the model output false news according to the user's request. This means that large multi-modal models have more risks and challenges than large language models.

In addition to these two examples, multi-modal large models also have various security threats or social risks such as illusion, bias, and privacy leakage, which will seriously affect their reliability and credibility in practical applications. Do these vulnerability issues occur by chance, or are they widespread? What are the differences in the credibility of different multimodal large models, and where do they come from?

Recently, researchers from Tsinghua University, Beihang University, Shanghai Jiao Tong University and Ruilai Intelligence jointly wrote a hundred-page article and released a comprehensive benchmark called MultiTrust, which for the first time comprehensively evaluated mainstream multi-modal modes from multiple dimensions and perspectives. The credibility of the large model demonstrates multiple potential security risks and inspires the next development of multi-modal large models.

Paper title: Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
Paper link: https://arxiv.org/pdf/2406.07057
Project homepage: https:// multi-trust.github.io/
Code repository: https://github.com/thu-ml/MMTrustEval In its large-scale model evaluation work, MultiTrust refined five credibility evaluation dimensions—truthfulness, safety, robustness, fairness, and privacy. Secondary classification is carried out, and tasks, indicators, and data sets are constructed in a targeted manner to provide a comprehensive assessment.

Task scenarios cover discrimination and generation tasks, spanning pure text tasks and multimodal tasks. The data sets corresponding to the tasks are not only transformed and adapted based on public text or image data sets, but also some more complex and challenging data are constructed through manual collection or algorithm synthesis.

Figure 5 MultiTrust task list

Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?

^{Different from the trustworthy evaluation of large language models (LLMs), ML The multi-modal features of LM bring more diverse and complex Risk scenarios and possibilities. In order to better conduct systematic evaluation, the MultiTrust benchmark not only starts from the traditional behavioral evaluation dimension, but also innovatively introduces the two evaluation perspectives of multi-modal risk and cross-modal impact, comprehensively covering the new issues brought by the new modalities. new challenge.险 Figure 6 The risk of multi -mode risk and cross -modular impact}

Specifically, multi -mode risk refers to the new risks brought by multi -mode scene, such as Possible incorrect answers when the model processes visual misleading information, as well as misjudgments in multi-modal reasoning involving safety issues. Although the model can correctly identify the alcohol in the picture, in further reasoning, some models are not aware of the potential risk of sharing it with cephalosporin drugs.

涉 Figure 7 Models in the reasoning involving security issues have misjudgment

Cross -modal effects refer to the impact of the addition of new modes on the credibility of the original mode, such as input of irrelevant images It may change the trusted behavior of the large language model backbone network in plain text scenarios, leading to more unpredictable security risks. In jailbreaking attacks and contextual privacy leakage tasks commonly used for large language model credibility assessment, if the model is provided with a picture that has nothing to do with the text, the original security behavior may be destroyed (Figure 2).

Result analysis and key conclusions

‐ to

^{----- a real-time update of the credibility list (part)}

The researchers maintain a regularly updated multi-modal database The latest models such as GPT-4o and Claude3.5 have been added to the model credibility list. Overall, closed-source commercial models are safer and more reliable than mainstream open-source models. Among them, OpenAI's GPT-4 and Anthropic's Claude ranked highest in credibility, while Microsoft Phi-3, which added security alignment, ranked highest among open source models, but there is still a certain gap with the closed source model.

Commercial models such as GPT-4, Claude, and Gemini have implemented many reinforcement technologies for security and trustworthiness, but there are still some security and trustworthiness risks. For example, they still show vulnerability to adversarial attacks, multi-modal jailbreak attacks, etc., which greatly interferes with user experience and trust.

Although the scores of many open source models on mainstream general lists are equivalent to or even better than GPT-4, In trust-level testing, these models still showed weaknesses and loopholes in different aspects. For example, the emphasis on general capabilities (such as OCR) during the training phase makes embedding jailbroken text and sensitive information into image input a more threatening source of risk.

Based on the experimental results of cross-modal effects, the author found that multi-modal training and inference will weaken the safe alignment mechanism of large language models. Many multi-modal large models will use aligned large language models as the backbone network and fine-tune during the multi-modal training process. The results show that these models still exhibit large security vulnerabilities and credible risks. At the same time, in multiple pure text trustworthiness assessment tasks, introducing images during reasoning will also have an impact and interference on the trustworthy behavior of the model.

后 Selepas imej diperkenalkan dalam Rajah 10, model lebih cenderung untuk membocorkan kandungan privasi dalam teks Eksperimen telah menunjukkan bahawa kredibiliti model berbilang mod dan besar adalah berkaitan dengan keupayaan amnya, tetapi masih terdapat perbezaan. dalam prestasi model dalam dimensi penilaian kredibiliti yang berbeza. Algoritma berkaitan model besar berbilang modal yang biasa pada masa ini, seperti set data penalaan halus yang dihasilkan dengan bantuan GPT-4V, RLHF untuk halusinasi, dll., tidak mencukupi untuk meningkatkan sepenuhnya kredibiliti model. Kesimpulan sedia ada juga menunjukkan bahawa model besar berbilang modal mempunyai cabaran unik yang berbeza daripada model bahasa besar, dan algoritma yang inovatif dan cekap diperlukan untuk penambahbaikan selanjutnya.

Lihat kertas untuk keputusan dan analisis terperinci.

Hala Tuju Masa Depan

Hasil penyelidikan menunjukkan bahawa meningkatkan kredibiliti model besar berbilang modal memerlukan perhatian khusus daripada penyelidik. Dengan menggunakan penyelesaian penjajaran model bahasa yang besar, data dan senario latihan yang pelbagai serta paradigma seperti Retrieval Enhanced Generation (RAG) dan Constitutional AI (Constitutional AI) boleh membantu meningkatkan ke tahap tertentu. Tetapi peningkatan kredibiliti model besar berbilang mod melampaui ini. Penjajaran antara modaliti dan keteguhan pengekod visual juga merupakan faktor utama yang mempengaruhi. Selain itu, meningkatkan prestasi model dalam aplikasi praktikal melalui penilaian berterusan dan pengoptimuman dalam persekitaran dinamik juga merupakan hala tuju penting pada masa hadapan.

Dengan keluaran penanda aras MultiTrust, pasukan penyelidik juga mengeluarkan alat penilaian kebolehpercayaan model besar multi-modal MMTrustEval Ciri-ciri integrasi model dan penilaiannya memberikan maklumat penting untuk penyelidikan kredibiliti model besar berbilang modal . Berdasarkan kerja dan kit alat ini, pasukan menganjurkan pertandingan data dan algoritma berkaitan keselamatan model besar berbilang modal [1,2] untuk mempromosikan penyelidikan yang boleh dipercayai pada model besar. Pada masa hadapan, dengan kemajuan teknologi yang berterusan, model besar berbilang modal akan menunjukkan potensi mereka dalam lebih banyak bidang, tetapi isu kredibiliti mereka masih memerlukan perhatian yang berterusan dan penyelidikan yang mendalam.

Pautan rujukan:

^{[1] CCDM2024 Multimodal Large Language Model Red Team Cabaran Keselamatan http://116.1114.8df}

^{[2] Pertandingan Algoritma Pazhou Ke-3 - Teknologi Pengukuhan Keselamatan Algoritma Model Besar Berbilang Modal https://iacc.pazhoulab-huangpu.com/contestdetail?id=668de7357ff47da8cc88c7b8&award=1,00},

The above is the detailed content of Tsinghua University takes the lead in releasing multi-modal evaluation MultiTrust: How reliable is GPT-4?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

InZoi: How To Apply To School And University

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks ago By DDD

Where to find the Site Office Key in Atomfall

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7864

Java Tutorial

1649

CakePHP Tutorial

1404

Laravel Tutorial

1300

PHP Tutorial

1242

Related knowledge

The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days Jul 17, 2024 am 01:56 AM

It is also a Tusheng video, but PaintsUndo has taken a different route. ControlNet author LvminZhang started to live again! This time I aim at the field of painting. The new project PaintsUndo has received 1.4kstar (still rising crazily) not long after it was launched. Project address: https://github.com/lllyasviel/Paints-UNDO Through this project, the user inputs a static image, and PaintsUndo can automatically help you generate a video of the entire painting process, from line draft to finished product. follow. During the drawing process, the line changes are amazing. The final video result is very similar to the original image: Let’s take a look at a complete drawing.

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Jul 17, 2024 pm 10:02 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Code repair; Deng Yinlin, fourth-year doctoral student, researcher

From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' Jun 24, 2024 pm 03:04 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com In the development process of artificial intelligence, the control and guidance of large language models (LLM) has always been one of the core challenges, aiming to ensure that these models are both powerful and safe serve human society. Early efforts focused on reinforcement learning methods through human feedback (RL

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated Aug 05, 2024 pm 03:32 PM

Recently, the Riemann Hypothesis, known as one of the seven major problems of the millennium, has achieved a new breakthrough. The Riemann Hypothesis is a very important unsolved problem in mathematics, related to the precise properties of the distribution of prime numbers (primes are those numbers that are only divisible by 1 and themselves, and they play a fundamental role in number theory). In today's mathematical literature, there are more than a thousand mathematical propositions based on the establishment of the Riemann Hypothesis (or its generalized form). In other words, once the Riemann Hypothesis and its generalized form are proven, these more than a thousand propositions will be established as theorems, which will have a profound impact on the field of mathematics; and if the Riemann Hypothesis is proven wrong, then among these propositions part of it will also lose its effectiveness. New breakthrough comes from MIT mathematics professor Larry Guth and Oxford University

arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it Aug 01, 2024 pm 05:18 PM

cheers! What is it like when a paper discussion is down to words? Recently, students at Stanford University created alphaXiv, an open discussion forum for arXiv papers that allows questions and comments to be posted directly on any arXiv paper. Website link: https://alphaxiv.org/ In fact, there is no need to visit this website specifically. Just change arXiv in any URL to alphaXiv to directly open the corresponding paper on the alphaXiv forum: you can accurately locate the paragraphs in the paper, Sentence: In the discussion area on the right, users can post questions to ask the author about the ideas and details of the paper. For example, they can also comment on the content of the paper, such as: "Given to

The first Mamba-based MLLM is here! Model weights, training code, etc. have all been open source Jul 17, 2024 am 02:46 AM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com. Introduction In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success. However, as the basic model for many downstream tasks, current MLLM consists of the well-known Transformer network, which

Axiomatic training allows LLM to learn causal reasoning: the 67 million parameter model is comparable to the trillion parameter level GPT-4 Jul 17, 2024 am 10:14 AM

Show the causal chain to LLM and it learns the axioms. AI is already helping mathematicians and scientists conduct research. For example, the famous mathematician Terence Tao has repeatedly shared his research and exploration experience with the help of AI tools such as GPT. For AI to compete in these fields, strong and reliable causal reasoning capabilities are essential. The research to be introduced in this article found that a Transformer model trained on the demonstration of the causal transitivity axiom on small graphs can generalize to the transitive axiom on large graphs. In other words, if the Transformer learns to perform simple causal reasoning, it may be used for more complex causal reasoning. The axiomatic training framework proposed by the team is a new paradigm for learning causal reasoning based on passive data, with only demonstrations

See all articles