Table of Contents
Structural Overview
Research results
Conclusion
Home Technology peripherals AI Google team launches new Transformer to optimize panoramic segmentation solution

Google team launches new Transformer to optimize panoramic segmentation solution

Apr 08, 2023 pm 01:41 PM
Google Model

Recently, the Google AI team proposed an end-to-end solution for panoramic segmentation using Mask Transformer, inspired by Transformer and DETR.

The full name is end-to-end solution for panoptic segmentation with mask transformers, which is mainly used to generate extensions of the segmentation MaskTransformer architecture.

The solution uses a pixel path (composed of a convolutional neural network or a visual Transformer) to extract pixel features, a memory path (composed of a Transformer decoder module) to extract memory features, and a dual-path Transformer for pixel features and Characteristics of interactions between memories.

However, the dual-path Transformer utilizing cross-attention was originally designed for language tasks, and its input sequence consists of hundreds of words.

For visual tasks, especially segmentation problems, the input sequence consists of tens of thousands of pixels, which not only indicates that the magnitude of the input scale is much larger, but also represents a lower representation compared to language words. level of embedding.

Panoramic segmentation is a computer vision problem that is now a core task in many applications.

It is divided into two parts: semantic segmentation and instance segmentation.

Semantic segmentation is like assigning semantic labels to each pixel in the image, such as "person" and "sky".

Instance segmentation only identifies and segments countable objects in the graph, such as "pedestrians" and "cars", and further divides them into several subtasks.

Each subtask is processed individually, and additional modules are applied to merge the results of each subtask stage.

This process is not only complex, but also introduces many artificially designed priors when processing subtasks and integrating the results of different subtasks.

Google team launches new Transformer to optimize panoramic segmentation solution

In "CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation" published at CVPR 2022, the article proposes to reinterpret and redesign cross-attention from the perspective of clustering cross attention (that is, grouping pixels with the same semantic label into the same group) to better adapt to visual tasks.

CMT-DeepLab builds on the previous state-of-the-art method MaX-DeepLab and adopts a pixel clustering method to perform cross-attention, resulting in denser and more reasonable attention maps.

Google team launches new Transformer to optimize panoramic segmentation solution

kMaX-DeepLab further redesigns cross-attention to be more like a k-means clustering algorithm with simple changes to the activation function.

Structural Overview

Researchers will reinterpret it from the perspective of clustering, rather than directly applying cross-attention to visual tasks without modification.

Specifically, they note that Mask Transformer object queries can be thought of as cluster centers (aimed at grouping pixels with the same semantic label).

The process of cross-attention is similar to the k-means clustering algorithm, (1) iterative process of assigning pixels to cluster centers, in which multiple pixels can be assigned to a single cluster center, and some Cluster centers may not have assigned pixels, and (2) cluster centers are updated by averaging pixels assigned to the same cluster center, if no pixels are assigned, cluster centers are not updated).

Google team launches new Transformer to optimize panoramic segmentation solution

In CMT-DeepLab and kMaX-DeepLab, we reformulate cross-attention from a clustering perspective, which includes iterative cluster assignment and clustering update step

Given the popularity of k-means clustering algorithm, in CMT-DeepLab, they redesigned the cross-attention for spatial-aspect softmax operation (i.e., applied along the spatial resolution of the image softmax operation), which actually assigns cluster centers to the opposite, pixels are applied along the cluster centers.

In kMaX-DeepLab, we further simplify spatial-wise softmax to cluster-wise argmax (i.e., apply the argmax operation along the cluster center).

They note that the argmax operation is the same as the hard assignment (i.e. one pixel is assigned to only one cluster) used in the k-means clustering algorithm.

Reconstructing MaskTransformer's cross-attention from a clustering perspective significantly improves segmentation performance and simplifies the complex MaskTransformer pipeline to make it more interpretable.

First, an encoder-decoder structure is used to extract pixel features from the input image. The pixels are then grouped using a set of cluster centers, which are further updated based on cluster assignments. Finally, the cluster assignment and update steps are performed iteratively, and the last assignment can be directly used as segmentation prediction.

Google team launches new Transformer to optimize panoramic segmentation solution

In order to convert the typical MaskTransformer decoder (composed of cross-attention, multi-head self-attention and feed-forward network) into the one proposed above k-means cross-attention, just replace the spatial-wise softmax with the cluster-wise maximum parameter.

The meta-architecture of kMaX-DeepLab proposed this time consists of three components: pixel encoder, enhanced pixel decoder and kMaX decoder.

The pixel encoder is the backbone of any network and is used to extract image features.

The enhanced pixel decoder includes a Transformer encoder to enhance pixel features, and an upsampling layer to generate higher resolution features.

A series of kMax decoders convert cluster centers into (1) Mask embedding vectors, which are multiplied with pixel features to generate predicted Masks, and (2) class predictions for each Mask.

Google team launches new Transformer to optimize panoramic segmentation solution

kMaX-DeepLab’s meta-architecture

Research results

Finally, the research team achieved success in the two most challenging panoramic segmentation data We evaluate CMT-DeepLab and kMaX-DeepLab using the Panorama Quality (PQ) metric on COCO and Cityscapes, and compare MaX-DeepLab with other state-of-the-art methods.

Among them, CMT-DeepLab achieved significant performance improvement, while kMaX-DeepLab not only simplified the modification, but also further improved it. The PQ on COCO val set was 58.0%, PQ was 68.4%, and 44.0% Mask Average precision (Mask AP), 83.5% average intersection over union (mIoU) on Cityscapes validation set, without test-time augmentation or use of external datasets.

Google team launches new Transformer to optimize panoramic segmentation solution

Designed from the perspective of clustering, kMaX-DeepLab not only has higher performance, but also can more reasonably visualize the attention map to understand its working mechanism.

In the example below, kMaX-DeepLab iteratively performs cluster assignment and updates, gradually improving Mask quality.

Google team launches new Transformer to optimize panoramic segmentation solution

kMaX-DeepLab’s attention map can be directly visualized as panoramic segmentation, making the model working mechanism more reasonable

Conclusion

This research Demonstrates a way to better design MaskTransformers in vision tasks.

With simple modifications, CMT-DeepLab and kMaX-DeepLab reconstruct cross-attention to make it more like a clustering algorithm.

Thus, the proposed model achieves state-of-the-art performance on COCO and Cityscapes datasets.

The research team stated that they hope that the open source version of kMaX-DeepLab in the DeepLab2 library will contribute to future research on the design of architectures dedicated to visual Transformers. ​

The above is the detailed content of Google team launches new Transformer to optimize panoramic segmentation solution. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Mar 04, 2025 pm 11:48 PM

A detailed introduction to the login operation of the Sesame Open Exchange web version, including login steps and password recovery process. It also provides solutions to common problems such as login failure, unable to open the page, and unable to receive verification codes to help you log in to the platform smoothly.

Sesame Open Door Exchange Web Page Registration Link Gate Trading App Registration Website Latest Sesame Open Door Exchange Web Page Registration Link Gate Trading App Registration Website Latest Feb 28, 2025 am 11:06 AM

This article introduces the registration process of the Sesame Open Exchange (Gate.io) web version and the Gate trading app in detail. Whether it is web registration or app registration, you need to visit the official website or app store to download the genuine app, then fill in the user name, password, email, mobile phone number and other information, and complete email or mobile phone verification.

Sesame Open Door Trading Platform Download Mobile Version Gateio Trading Platform Download Address Sesame Open Door Trading Platform Download Mobile Version Gateio Trading Platform Download Address Feb 28, 2025 am 10:51 AM

It is crucial to choose a formal channel to download the app and ensure the safety of your account.

Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Mar 18, 2025 pm 12:15 PM

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

Tutorial on how to register, use and cancel Ouyi okex account Tutorial on how to register, use and cancel Ouyi okex account Mar 31, 2025 pm 04:21 PM

This article introduces in detail the registration, use and cancellation procedures of Ouyi OKEx account. To register, you need to download the APP, enter your mobile phone number or email address to register, and complete real-name authentication. The usage covers the operation steps such as login, recharge and withdrawal, transaction and security settings. To cancel an account, you need to contact Ouyi OKEx customer service, provide necessary information and wait for processing, and finally obtain the account cancellation confirmation. Through this article, users can easily master the complete life cycle management of Ouyi OKEx account and conduct digital asset transactions safely and conveniently.

How to register and download the latest app on Bitget official website How to register and download the latest app on Bitget official website Mar 05, 2025 am 07:54 AM

This guide provides detailed download and installation steps for the official Bitget Exchange app, suitable for Android and iOS systems. The guide integrates information from multiple authoritative sources, including the official website, the App Store, and Google Play, and emphasizes considerations during download and account management. Users can download the app from official channels, including app store, official website APK download and official website jump, and complete registration, identity verification and security settings. In addition, the guide covers frequently asked questions and considerations, such as

Why is Bittensor said to be the 'bitcoin' in the AI ​​track? Why is Bittensor said to be the 'bitcoin' in the AI ​​track? Mar 04, 2025 pm 04:06 PM

Original title: Bittensor=AIBitcoin? Original author: S4mmyEth, Decentralized AI Research Original translation: zhouzhou, BlockBeats Editor's note: This article discusses Bittensor, a decentralized AI platform, hoping to break the monopoly of centralized AI companies through blockchain technology and promote an open and collaborative AI ecosystem. Bittensor adopts a subnet model that allows the emergence of different AI solutions and inspires innovation through TAO tokens. Although the AI ​​market is mature, Bittensor faces competitive risks and may be subject to other open source

Detailed tutorial on how to register for binance (2025 beginner's guide) Detailed tutorial on how to register for binance (2025 beginner's guide) Mar 18, 2025 pm 01:57 PM

This article provides a complete guide to Binance registration and security settings, covering pre-registration preparations (including equipment, email, mobile phone number and identity document preparation), and introduces two registration methods on the official website and APP, as well as different levels of identity verification (KYC) processes. In addition, the article also focuses on key security steps such as setting up a fund password, enabling two-factor verification (2FA, including Google Authenticator and SMS Verification), and setting up anti-phishing codes, helping users to register and use the Binance Binance platform for cryptocurrency transactions safely and conveniently. Please be sure to understand relevant laws and regulations and market risks before trading and invest with caution.

See all articles