


Unified visual AI capabilities! Automated image detection and segmentation, and controllable Vincentian images, produced by a Chinese team
This article is reprinted with the authorization of AI New Media Qubit (public account ID: QbitAI). Please contact the source for reprinting.
Now it’s time for the AI circle to compete with hand speed.
No, Meta’s SAM has just been launched a few days ago, and domestic programmers have come to superimpose a wave of buffs, integrating target detection, segmentation, and generation of major visual AI functions all in one!
For example, based on Stable Diffusion and SAM, you can seamlessly replace the chair in the photo with a sofa:
It is also so easy to change clothes and hair color :
As soon as the project was released, many people exclaimed: The hand speed is too fast!
Someone else said: There are new wedding photos of Yui Aragaki and I.
The above is the effect brought by Gounded-SAM. The project has received 1.8k stars on GitHub.
To put it simply, this is a zero-shot vision application that only needs to input images to automatically detect and segment images.
This research comes from IDEA Research Institute (Guangdong-Hong Kong-Macao Greater Bay Area Digital Economy Research Institute), whose founder and chairman is Shen Xiangyang.
No additional training required
Grounded SAM is mainly composed of two models: Grounding DINO and SAM.
SAM (Segment Anything) is a zero-sample segmentation model just launched by Meta 4 days ago.
It can generate masks for any objects in images/videos, including objects and images that have not appeared during the training process.
By allowing SAM to return a valid mask for any prompt, the model's output should be a reasonable mask among all possibilities, even if the prompt is ambiguous or points to multiple objects. This task is used to pretrain the model and solve general downstream segmentation tasks via hints.
The model framework mainly consists of an image encoder, a hint encoder and a fast mask decoder. After computing the image embedding, SAM is able to generate a segmentation based on any prompt in the web within 50 milliseconds.
Grounding DINO is an existing achievement of this research team.
This is a zero-shot detection model, which can generate object boxes and labels with text descriptions.
After combining the two, you can find any object in the picture through text description, and then use SAM's powerful segmentation capability to segment the mask in a fine-grained manner.
On top of these abilities, they also added the ability of Stable Diffusion, which is the controllable image generation shown at the beginning.
It is worth mentioning that Stable Diffusion has been able to achieve similar functions before. Just erase the image elements you want to replace and enter the text prompt.
This time, Grounded SAM can save the step of manual selection and control it directly through text description.
In addition, combined with BLIP (Bootstrapping Language-Image Pre-training), it generates image titles, extracts labels, and then generates object boxes and masks.
Currently, there are more interesting features under development.
For example, some expansion of characters: changing clothes, hair color, skin color, etc.
Public information shows that the institute is an international innovative research institution for artificial intelligence, digital economy industry and cutting-edge technology. Former chief scientist of Microsoft Asia Research Institute and former vice president of Microsoft Global Intelligence Shen Xiangyang Dr. serves as the founder and chairman.
One More Thing
For the future work of Grounded SAM, the team has several prospects:
- Automatically generate images to form a new data set
- The powerful basic model with segmentation pre-training
- cooperates with (Chat-)GPT
- to form a pipeline that automatically generates image labels, boxes and masks, and can generate new images.
It is worth mentioning that many of the team members of this project are active respondents in the AI field on Zhihu. This time they also answered questions about Grounded SAM on Zhihu. Content, interested children can leave a message to ask~
The above is the detailed content of Unified visual AI capabilities! Automated image detection and segmentation, and controllable Vincentian images, produced by a Chinese team. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Bitcoin’s price fluctuations today are affected by many factors such as macroeconomics, policies, and market sentiment. Investors need to pay attention to technical and fundamental analysis to make informed decisions.

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.
