Table of Contents
Preface" >Preface
Cache consistency" >Cache consistency
Sequential Consistency" >Sequential Consistency
Memory barrier" >Memory barrier
Inter-processor synchronization" >Inter-processor synchronization
内存屏障的实现" >内存屏障的实现
Home System Tutorial LINUX A detailed explanation of memory barriers in the Linux kernel

A detailed explanation of memory barriers in the Linux kernel

Feb 10, 2024 pm 03:00 PM
linux linux tutorial linux system linux command shell script embeddedlinux Getting started with linux linux learning

Preface

I have read a discussion article about sequential consistency and cache consistency before, and I have a clearer understanding of the difference and connection between these two concepts. In the Linux kernel, there are many synchronization and barrier mechanisms, which I would like to summarize here.

A detailed explanation of memory barriers in the Linux kernel

Cache consistency

Before I always thought that many mechanisms in Linux were to ensure cache consistency, but in fact, most of the cache consistency is achieved by hardware mechanisms. Only when using instructions with a lock prefix, it has something to do with caching (although this is definitely not strict, but from the current point of view, this is the case in most cases). Most of the time, we want to ensure sequential consistency.

Cache consistency means that in a multi-processor system, each CPU has its own L1 cache. Since the contents of the same piece of memory may be cached in the L1 cache of different CPUs, when a CPU changes its cached content, it must ensure that another CPU can also read the latest content when reading this data. But don't worry, this complex work is completely done by the hardware. By implementing the MESI protocol, the hardware can easily complete the cache coherency work. Even if multiple CPUs write at the same time, there will be no problem. Whether it is in its own cache, the cache of other CPUs, or in memory, a CPU can always read the latest data. This is how cache consistency works.

Sequential Consistency

The so-called sequential consistency refers to a completely different concept from cache consistency, although they are both products of processor development. Because compiler technology continues to evolve, it may change the order of certain operations in order to optimize your code. The concepts of multi-issue and out-of-order execution have long been present in processors. The result is that the actual order of instructions executed will be slightly different from the order of execution of the code during programming. Of course, this is nothing under a single processor. After all, as long as your own code does not pass, no one will care. The compiler and processor disrupt the order of execution while ensuring that their own code cannot be discovered. But this is not the case with multiprocessors. The order in which instructions are completed on one processor may have a great impact on the code executed on other processors. Therefore, there is the concept of sequential consistency, which ensures that the execution order of threads on one processor is the same from the perspective of threads on other processors. The solution to this problem cannot be solved by the processor or compiler alone, but requires software intervention.

Memory barrier

The method of software intervention is also very simple, that is, inserting a memory barrier. In fact, the term memory barrier was coined by processor developers, which makes it difficult for us to understand. Memory barriers can easily lead us to cache consistency, and even doubt whether we can do this to allow other CPUs to see the modified cache. It is wrong to think so. The so-called memory barrier, from a processor perspective, is used to serialize read and write operations. From a software perspective, it is used to solve the problem of sequential consistency. Doesn’t the compiler want to disrupt the order of code execution? Doesn’t the processor want to execute the code out of order? When you insert a memory barrier, it is equivalent to telling the compiler that the order of instructions before and after the barrier cannot be reversed. It tells the processor that it can only wait for the instructions before the barrier. After the instruction is executed, the instruction behind the barrier can begin to be executed. Of course, memory barriers can stop the compiler from messing around, but the processor still has a way. Isn't there a concept of multi-issue, out-of-order execution, and sequential completion in the processor? During the memory barrier, it only needs to ensure that the read and write operations of the previous instructions must be completed before the read and write operations of the following instructions are completed. Therefore, there are three types of memory barriers: read barriers, write barriers, and read-write barriers. For example, before x86, write operations were guaranteed to be completed in order, so write barriers were not needed. However, some ia32 processors now have write operations that are completed out of order, so write barriers are also needed.
In fact, in addition to special read-write barrier instructions, there are many instructions that are executed with read-write barrier functions, such as instructions with a lock prefix. Before the emergence of special read and write barrier instructions, Linux relied on lock to survive.
As for where to insert the read and write barriers, it depends on the needs of the software. The read-write barrier cannot fully achieve sequential consistency, but the thread on the multi-processor will not always stare at your execution order. As long as it ensures that when it looks over, it thinks that you comply with the sequential consistency, the execution will not cause you There are no unexpected situations in the code. The so-called unexpected situation, for example, your thread first assigns a value to variable a, and then assigns a value to variable b. As a result, threads running on other processors look over and find that b has been assigned a value, but a has not been assigned a value. (Note This inconsistency is not caused by cache inconsistency, but by the inconsistency in the order in which the processor write operations are completed). In this case, a write barrier must be added between the assignment of a and the assignment of b.

Inter-processor synchronization

With SMP, threads start running on multiple processors at the same time. As long as it is a thread, there are communication and synchronization requirements. Fortunately, the SMP system uses shared memory, which means that all processors see the same memory content. Although there is an independent L1 cache, cache consistency processing is still handled by the hardware. If threads on different processors want to access the same data, they need critical sections and synchronization. What synchronization depends on? In the UP system before, we relied on semaphores at the top and turned off interrupts and read, modify and write instructions at the bottom. Now in SMP systems, turning off interrupts has been abolished. Although it is still necessary to synchronize threads on the same processor, it is no longer enough to rely on it alone. Read modify write instructions? Not anymore. When the read operation in your instruction is completed and the write operation is not carried out, another processor may perform a read operation or write operation. The cache coherence protocol is advanced, but it is not yet advanced enough to predict which instruction issued this read operation. So x86 invented instructions with lock prefix. When this instruction is executed, all cache lines containing the read and write addresses in the instruction will be invalidated and the memory bus will be locked. In this way, if other processors want to read or write the same address or the address on the same cache line, they can neither do it from the cache (the relevant line in the cache has expired), nor can they do it from the memory bus (the entire memory bus has failed). locked), finally achieving the goal of atomic execution. Of course, starting from the P6 processor, if the address to be accessed by the lock prefix instruction is already in the cache, there is no need to lock the memory bus and the atomic operation can be completed (although I suspect this is because of the addition of the internal common function of the multi-processor). Because of the L2 cache).

Because the memory bus will be locked, unfinished read and write operations will be completed before the instruction with the lock prefix is ​​executed, which also functions as a memory barrier.
Nowadays, the synchronization of threads between multi-processors uses spin locks at the top and read, modify and write instructions with lock prefix at the bottom. Of course, the actual synchronization also includes disabling the task scheduling of the processor, adding task off interrupts, and adding a semaphore outside. The implementation of this kind of spin lock in Linux has gone through four generations of development and has become more efficient and powerful.

内存屏障的实现

\#ifdef CONFIG_SMP  
\#define smp_mb()  mb()  
\#define smp_rmb()  rmb()  
\#define smp_wmb()  wmb()  
\#else  
\#define smp_mb()  barrier()  
\#define smp_rmb()  barrier()  
\#define smp_wmb()  barrier()  
\#endif 
Copy after login

CONFIG_SMP就是用来支持多处理器的。如果是UP(uniprocessor)系统,就会翻译成barrier()。

#define barrier() asm volatile(“”: : :”memory”)
barrier()的作用,就是告诉编译器,内存的变量值都改变了,之前存在寄存器里的变量副本无效,要访问变量还需再访问内存。这样做足以满足UP中所有的内存屏障。

\#ifdef CONFIG_X86_32  
/* 
 \* Some non-Intel clones support out of order store. wmb() ceases to be a 
 \* nop for these. 
 */  
\#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)  
\#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)  
\#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)  
\#else  
\#define mb()  asm volatile("mfence":::"memory")  
\#define rmb()  asm volatile("lfence":::"memory")  
\#define wmb()  asm volatile("sfence" ::: "memory")  
\#endif 
Copy after login

如果是SMP系统,内存屏障就会翻译成对应的mb()、rmb()和wmb()。这里CONFIG_X86_32的意思是说这是一个32位x86系统,否则就是64位的x86系统。现在的linux内核将32位x86和64位x86融合在同一个x86目录,所以需要增加这个配置选项。

可以看到,如果是64位x86,肯定有mfence、lfence和sfence三条指令,而32位的x86系统则不一定,所以需要进一步查看cpu是否支持这三条新的指令,不行则用加锁的方式来增加内存屏障。

SFENCE,LFENCE,MFENCE指令提供了高效的方式来保证读写内存的排序,这种操作发生在产生弱排序数据的程序和读取这个数据的程序之间。 
  SFENCE——串行化发生在SFENCE指令之前的写操作但是不影响读操作。 
  LFENCE——串行化发生在SFENCE指令之前的读操作但是不影响写操作。 
  MFENCE——串行化发生在MFENCE指令之前的读写操作。 
sfence:在sfence指令前的写操作当必须在sfence指令后的写操作前完成。 
lfence:在lfence指令前的读操作当必须在lfence指令后的读操作前完成。 
mfence:在mfence指令前的读写操作当必须在mfence指令后的读写操作前完成。
Copy after login

至于带lock的内存操作,会在锁内存总线之前,就把之前的读写操作结束,功能相当于mfence,当然执行效率上要差一些。

说起来,现在写点底层代码真不容易,既要注意SMP问题,又要注意cpu乱序读写问题,还要注意cache问题,还有设备DMA问题,等等。

多处理器间同步的实现
多处理器间同步所使用的自旋锁实现,已经有专门的文章介绍

The above is the detailed content of A detailed explanation of memory barriers in the Linux kernel. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What computer configuration is required for vscode What computer configuration is required for vscode Apr 15, 2025 pm 09:48 PM

VS Code system requirements: Operating system: Windows 10 and above, macOS 10.12 and above, Linux distribution processor: minimum 1.6 GHz, recommended 2.0 GHz and above memory: minimum 512 MB, recommended 4 GB and above storage space: minimum 250 MB, recommended 1 GB and above other requirements: stable network connection, Xorg/Wayland (Linux)

How to run java code in notepad How to run java code in notepad Apr 16, 2025 pm 07:39 PM

Although Notepad cannot run Java code directly, it can be achieved by using other tools: using the command line compiler (javac) to generate a bytecode file (filename.class). Use the Java interpreter (java) to interpret bytecode, execute the code, and output the result.

vscode cannot install extension vscode cannot install extension Apr 15, 2025 pm 07:18 PM

The reasons for the installation of VS Code extensions may be: network instability, insufficient permissions, system compatibility issues, VS Code version is too old, antivirus software or firewall interference. By checking network connections, permissions, log files, updating VS Code, disabling security software, and restarting VS Code or computers, you can gradually troubleshoot and resolve issues.

Linux Architecture: Unveiling the 5 Basic Components Linux Architecture: Unveiling the 5 Basic Components Apr 20, 2025 am 12:04 AM

The five basic components of the Linux system are: 1. Kernel, 2. System library, 3. System utilities, 4. Graphical user interface, 5. Applications. The kernel manages hardware resources, the system library provides precompiled functions, system utilities are used for system management, the GUI provides visual interaction, and applications use these components to implement functions.

How to use VSCode How to use VSCode Apr 15, 2025 pm 11:21 PM

Visual Studio Code (VSCode) is a cross-platform, open source and free code editor developed by Microsoft. It is known for its lightweight, scalability and support for a wide range of programming languages. To install VSCode, please visit the official website to download and run the installer. When using VSCode, you can create new projects, edit code, debug code, navigate projects, expand VSCode, and manage settings. VSCode is available for Windows, macOS, and Linux, supports multiple programming languages ​​and provides various extensions through Marketplace. Its advantages include lightweight, scalability, extensive language support, rich features and version

Can vscode be used for mac Can vscode be used for mac Apr 15, 2025 pm 07:36 PM

VS Code is available on Mac. It has powerful extensions, Git integration, terminal and debugger, and also offers a wealth of setup options. However, for particularly large projects or highly professional development, VS Code may have performance or functional limitations.

What is vscode What is vscode for? What is vscode What is vscode for? Apr 15, 2025 pm 06:45 PM

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages ​​and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

How to check the warehouse address of git How to check the warehouse address of git Apr 17, 2025 pm 01:54 PM

To view the Git repository address, perform the following steps: 1. Open the command line and navigate to the repository directory; 2. Run the "git remote -v" command; 3. View the repository name in the output and its corresponding address.

See all articles