


The super popular mini GPT-4's visual capabilities have skyrocketed, with 20,000 stars on GitHub, produced by a Chinese team
GPT-4V for target detection? Actual test by netizens: Not ready yet.
#While the detected categories are fine, most of the bounding boxes are misplaced.
It doesn’t matter, someone will take action!
The Mini GPT-4 that beat GPT-4 in image viewing ability by several months has been upgraded——MiniGPT-v2.
△ (GPT-4V is generated on the left and MiniGPT-v2 is generated on the right)
And it’s just a simple command: [grounding] describe This image in detail is the result achieved.
Not only that, it can also handle various visual tasks easily.
Circle an object and add [identify] in front of the prompt word to allow the model to directly identify the name of the object.
Of course, you can also add nothing and just ask~
MiniGPT-v2 is created by MiniGPT-4 Developed by the original team (KAUST King Abdullah University of Science and Technology) and five researchers from Meta.
Last time MiniGPT-4 attracted huge attention when it came out, and the server was overwhelmed for a while. Now the GitHub project has exceeded 22,000 stars.
With this upgrade, some netizens have already begun to use it~
Common interface for multiple visual tasks
As a common interface for various text applications, large models are already commonplace. Inspired by this, the research team wants to build a unified interface that can be used for a variety of visual tasks, such as image description, visual question answering, etc.
"How to use simple multi-modal instructions to efficiently complete various tasks under the conditions of a single model?" has become a difficult problem that the team needs to solve.
Simply put, MiniGPT-v2 consists of three parts: visual backbone, linear layer and large language model.
The model is based on the ViT visual backbone and remains unchanged during all training stages. Four adjacent visual output tokens are induced from ViT and projected into the LLaMA-2 language model space through linear layers.
The team recommends using unique identifiers for different tasks in the training model, so that large models can easily distinguish each task instruction and improve the learning efficiency of each task.
Training is mainly divided into three stages: pre-training - multi-task training - multi-mode instruction adjustment.
In the end, MiniGPT-v2 outperformed other visual language general models on many visual question answering and visual grounding benchmarks.
Ultimately, this model can complete a variety of visual tasks, such as target object description, visual localization, image description, visual question answering, and direct image parsing from given input text. object.
Interested friends can click on the Demo link below to experience it:
https://minigpt-v2.github.io/
https://huggingface.co/spaces/Vision-CAIR/MiniGPT-v2
Paper link: https://arxiv.org/abs/2310.09478
GitHub link: https://github.com/Vision-CAIR/MiniGPT-4
The above is the detailed content of The super popular mini GPT-4's visual capabilities have skyrocketed, with 20,000 stars on GitHub, produced by a Chinese team. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Bitcoin’s price fluctuations today are affected by many factors such as macroeconomics, policies, and market sentiment. Investors need to pay attention to technical and fundamental analysis to make informed decisions.

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.
