


The latest progress in sparse models! Ma Yi + LeCun join forces: 'White box' unsupervised learning
Recently Professor Ma Yi and Turing Award winner Yann LeCun jointly published a paper at ICLR 2023, describing a minimalist and interpretable unsupervised learning method that does not require resorting to data augmentation , hyperparameter adjustment or other engineering designs, you can achieve performance close to the SOTA SSL method.
Paper link: https://arxiv.org/abs/2209.15261
This method utilizes sparse manifold transformation and combines sparse coding, manifold learning and slow feature analysis.
Using a single-layer deterministic sparse manifold transformation, it can achieve 99.3% KNN top-1 accuracy on MNIST and 81.1% KNN top-1 accuracy on CIFAR-10 Accuracy, it can reach 53.2% KNN top-1 accuracy on CIFAR-100.
Through simple grayscale enhancement, the model's accuracy on CIFAR-10 and CIFAR-100 reached 83.2% and 57% respectively. These results significantly reduced the simple "white box" ” method and SOTA method.
In addition, the article also provides a visual explanation of how to form an unsupervised representation transformation. This method is closely related to the latent embedding self-supervised method and can be regarded as the simplest VICReg method.
Although there is still a small performance gap between our simple constructive model and the SOTA approach, there is Evidence suggests that this is a promising direction for achieving a principled, white-box approach to unsupervised learning.
The first author of the article, Yubei Chen, is a postdoctoral assistant at the Center for Data Science (CDS) and Meta Basic Artificial Intelligence Research (FAIR) at New York University. His supervisor is Professor Yann LeCun. He graduated with a Ph.D. from California The Redwood Center for Theoretical Neuroscience and the Berkeley Artificial Intelligence Institute (BAIR) at the University of Berkeley. He graduated from Tsinghua University with a bachelor's degree.
The main research direction is the intersection of computational neuroscience learning and deep unsupervised (self-supervised) learning. Research results Enhanced understanding of the computational principles of unsupervised representation learning in brains and machines, and reshaped understanding of natural signal statistics.
Professor Ma Yi received a double bachelor's degree in automation and applied mathematics from Tsinghua University in 1995, a master's degree in EECS from the University of California, Berkeley in 1997, and a master's degree in mathematics and EECS in 2000 Ph.D. He is currently a professor in the Department of Electrical Engineering and Computer Science at the University of California, Berkeley, and is also an IEEE Fellow, ACM Fellow, and SIAM Fellow.
Yann LeCun is best known for his work using convolutional neural networks (CNN) in optical character recognition and computer vision. He is also known as the father of convolutional networks; in 2019 he also Bengio and Hinton jointly won the Turing Award, the highest award in computer science.
Starting from the simplest unsupervised learning
In the past few years, unsupervised representation learning has made great progress and is expected to be used in data-driven machines. Provides powerful scalability in learning.
However, what is a learned representation and how exactly it is formed in an unsupervised manner are still unclear; furthermore, whether there is a set of tools to support all these unsupervised Common principles of representation remain unclear.
Many researchers have realized the importance of improving model understanding and have taken some pioneering measures in an attempt to simplify the SOTA method, establish connections with classic methods, and unify Different methods for visualizing representations and analyzing these methods from a theoretical perspective, with the hope of developing a different theory of computation: one that allows us to build simple, fully interpretable "white boxes" from data based on first principles model, the theory can also provide guidance for understanding the principles of unsupervised learning in the human brain.
In this work, the researchers took another small step towards this goal, trying to build the simplest "white box" unsupervised learning model that does not require deep networks, projection heads, Data augmentation or various other engineering designs.
This article uses two classic unsupervised learning principles, namely sparsity and spectral embedding. embedding), a two-layer model was built and achieved non-significant benchmark results on several standard datasets.
Experimental results show that the two-layer model based on sparse manifold transform has the same objective as the latent-embedding self-supervised method, and can perform better without any data enhancement. In this case, it achieved the highest level 1 accuracy of KNN of 99.3% on MNIST, 81.1% of the highest level 1 accuracy of KNN on CIFAR-10, and 53.2% accuracy on CIFAR-100.
Through simple grayscale enhancement, we further achieved 83.2% KNN top-1 accuracy on CIFAR-10 and 57% KNN top-1 accuracy on CIFAR-100. Accuracy.
These results take an important step towards closing the gap between "white box" models and SOTA self-supervised (SSL) models. Although the gap is still obvious, the researchers believe that further Closing the gap makes it possible to gain a deeper understanding of the learning of unsupervised representations, which is also a promising research line towards the practical application of this theory.
Three basic questions
What is unsupervised (self-supervised) re-presentation
Essentially, any non-identity transformation of the original signal can be called representation (re-presentation), but the academic community is more interested in those useful Convert.
A macro goal of unsupervised re-presentation learning is to find a function that transforms the original data into a new space so that "similar" things are placed closer together ;At the same time, the new space should not be collapsed and trivial, that is, the geometric or random structure of the data must be preserved.
If this goal is achieved, then "dissimilar" content will naturally be placed far away in the representation space.
Where does similarity come from?
The similarity mainly comes from three classic ideas: 1) temporal co-occurrence, 2) spatial co-occurrence; and 3) local adjacency in the original signal space. neighborhoods).
When the underlying structure is a geometric structure, these ideas overlap to a considerable extent; but when the structure is a random structure, they are also conceptually different, as shown below Demonstrates the difference between manifold structure and stochastic co-occurrence structure.
Taking advantage of locality, related work proposes two unsupervised learning methods: manifold learning and co-occurrence statistics Modeling, many of these ideas reach the formulation of lineage decomposition or the closely related formulation of matrix decomposition.
The idea of manifold learning is that only local neighborhoods in the original signal space are credible. By comprehensively considering all local neighborhoods, the global geometry will emerge, that is "Think globally, fit locally" (think globally, fit locally).
In contrast, co-occurrence statistical modeling follows a probabilistic philosophy, because some structures cannot be modeled with continuous manifolds, so it is also a complement to the manifold philosophy.
One of the most obvious examples comes from natural language, where the original data basically does not come from smooth geometry, such as in word embeddings, the embeddings of "Seattle" and "Dallas" may be similar, even though they do not co-occur frequently, the underlying reason is that they have similar contextual patterns.
The perspectives of probability and manifold are complementary to each other in understanding "similarity". Once there is a definition of similarity, a transformation can be constructed to make similar concepts closer.
How to establish representation transformation in this article? Basic principles: sparsity and low rank
In general, sparsity can be used to handle locality and decomposition in the data space to establish support; Then a low-frequency function is used to construct a representation transformation that assigns similar values to similar points on the support.
The whole process can also be called sparse manifold transform (sparse manifold transform).
The above is the detailed content of The latest progress in sparse models! Ma Yi + LeCun join forces: 'White box' unsupervised learning. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

Project link written in front: https://nianticlabs.github.io/mickey/ Given two pictures, the camera pose between them can be estimated by establishing the correspondence between the pictures. Typically, these correspondences are 2D to 2D, and our estimated poses are scale-indeterminate. Some applications, such as instant augmented reality anytime, anywhere, require pose estimation of scale metrics, so they rely on external depth estimators to recover scale. This paper proposes MicKey, a keypoint matching process capable of predicting metric correspondences in 3D camera space. By learning 3D coordinate matching across images, we are able to infer metric relative
