Home Computer Tutorials Computer Knowledge A brief analysis of the Linux file system (File System) architecture

A brief analysis of the Linux file system (File System) architecture

Mar 11, 2024 pm 03:31 PM
File system Architecture Linux operating system vfs

Linux的文件系统(File System)架构简析

This article mainly discusses virtual file systems. The architecture of the Linux file system includes the abstraction layer between specific file systems (such as Ext2, Ext3, XFS, etc.) and applications, namely the Virtual File System (VFS). VFS allows applications to communicate with different types of file systems without knowing the details of the underlying file system. With VFS, file system implementations can be isolated and decoupled from applications, thereby improving system flexibility and maintainability. VFS also allows the Linux kernel to support multiple file system types and provides a unified interface for applications to access the file system. Under the framework of VFS, different file systems can communicate with the kernel by implementing standard file system operation interfaces

picture

The above figure shows that the center of this architecture is the virtual file system VFS. VFS provides a file system framework, and local file systems can be implemented based on VFS. It mainly completes the following tasks:

1) As an abstraction layer, VFS provides a unified interface (read, write, chmod, etc.) for the application layer.

2) Some common functions are implemented in VFS, such as inode cache and page cache.

3) Standardizes the interface that a specific file system should implement.

Based on the above settings, other specific file systems only need to follow the conventions of VFS, implement the corresponding interfaces and internal logic, and register them in the system. After the user formats and mounts the file system, he or she can use the hard disk resources to perform operations based on the file system.

In the Linux operating system, after formatting the disk, you need to use the mount command to mount it to a directory in the system directory tree. This directory is called a mount point. After the mounting is completed, we can use the hard disk space formatted based on this file system. In the Linux operating system, the mount point can be almost any directory, but for standardization, the mount point is usually a subdirectory under the mnt directory.​

The following shows a relatively complex directory structure. In this directory structure, the root directory is located on the hard disk sda, and there are three subdirectories under the mnt directory, namely ext4, xfs and nfs, which respectively mount the Ext4 file system (built on the sdb hard disk) and the XFS file system (built on on sdc hard drive) and NFS file system (mounted over network).

picture

The relationship between multiple file systems in the directory tree is represented by some data structures in the kernel. When mounting a file system, the relationship between the file systems will be established and the API of the specific file system will be registered. When the user mode calls the API to open a file, it will find the corresponding file system API and associate it to the file-related structure (such as file and inode, etc.).

The above description is relatively schematic, but you may still feel a little confused. But don’t worry, we will introduce VFS and how to support multiple file systems in more detail next based on the code.

1. From file system API to VFS, then to specific file system

Linux’s VFS was not available from the beginning. The earliest released version of Linux did not have VFS. Moreover, VFS was not invented in Linux. It was first developed by Sun in 1985 in its SunOS2.0. The main purpose of developing VFS is to adapt its local file system and NFS file system.

VFS implements the abstraction of specific file systems through a set of public APIs and data structures. When the user calls the file system API provided by the operating system, the function implemented by the kernel VFS is called through a soft interrupt. The following table shows the correspondence between some file APIs and kernel VFS functions.​

User Mode API

Kernel function

illustrate

open

do_sys_open

open a file

close

ksys_close

Close file

read

ksys_read/vfs_read

Read data

write

ksys_write/vfs_write

data input

mount

do_mount

Mount file system

It can be seen from the above table that each user-mode API has a kernel-mode function corresponding to it. When an application calls the API of the file system, the corresponding function in the kernel state is triggered. What is listed here is only a relatively small subset of the file system API, and the purpose is to illustrate the relationship between the API and VFS. If you want to know about other APIs, please read the kernel source code yourself, and I won’t go into details in this article.

In order to give everyone a perceptual understanding of the relationship between VFS and specific file systems, this section takes Ext2's writing API as an example to show the calling relationship from API to VFS functions and then to Ext2 file system functions. As shown in the figure below, the API function write triggers the ksys_write function of the kernel through a soft interrupt. After some processing, this function will finally call the ext2_file_write_iter function of the Ext2 file system through the function pointer (file->f_op->wirte_iter).​

picture

In the above figure, the entrance to the kernel process is the ksys_write function. As can be seen from the implementation code, the main purpose here is to obtain a fd, and then call vfs_write with the member file in the fd as a parameter. Among them, fd is a structure, and its format is as shown in the figure below. The file member is a relatively core data structure. As can be seen from the above figure, it is through the contents of this member that the functions of the Ext2 file system are transferred.​

picture

It seems very simple, VFS only needs to call the function pointer registered by the specific file system. But there is an unresolved problem here. When were the function pointers in VFS registered?

The function pointer of Ext2 is initialized when the file is opened (for details, please refer to Section 3.1.2.2 of "Insider of File System Technology"). As we all know, a user-mode program returns a file descriptor when it opens a file, but the structure file representing the file in the kernel corresponds to it. The more important members of this structure include f_inode, f_ops and f_mapping, as shown in the figure below.

picture

In the above figure, f_inode is the inode node corresponding to the file. f_ops is a collection of function pointers for file operations in a specific file system (such as Ext2). It is initialized when the file is opened. It is through this collection of function pointers that VFS implements access to specific file systems.

The above involves another concept of VFS, inode. In Linux, inode is the abbreviation of index node, which represents a specific object (such as a file or directory) in the file system. There is a data structure named inode in VFS, which is an abstraction of the specific file system inode. For example, in the Ext2 file system, it is specifically defined as ext2_inode_info, but in XFS, it is represented by the data structure xfs_inode. Moreover, the inode data structure of the specific file system is intrinsically related to the inode of VFS. You can read the code by yourself.

2.inode cache and dentry cache

In the architecture diagram, we see that there are several cache implementations in VFS, including page cache, inode cache, dentry cache, etc. The inode cache and dentry cache are implemented in the same way and are relatively simple. Therefore, this article first introduces these two caches.

In fact, these two caches are implemented through hash tables. Everyone knows the concept of hash tables and will not go into details in this article. Take the inode cache as an example. The following figure shows its initialization process. Through the parameter ihash_entries, you can see that its size is dynamic (its size is related to the system memory. The larger the inode cache is when the system memory is read).

picture

Since inode and dentry are frequently accessed when accessing files, caching them can avoid the performance loss caused by reading data from the hard disk.

3.Page Cache

The function of VFS page cache (Cache) is mainly used to improve the performance of the file system. Caching technology refers to technology that stores part of the data and metadata of the file system in memory to improve file system performance. Since the access delay of memory is one hundred thousandth of the access delay of a mechanical hard disk (as shown in the figure below, with the register as the base unit 1s), the use of caching technology can greatly improve the performance of the file system.

picture

Cache improves the performance of the file system through three aspects of IO optimization, namely hot data, pre-reading and IO merging. Many applications will have hot data. For example, when an author edits a document, the current data block and nearby data blocks are hot data. Or when a popular article appears, the content of this article is hot data. The performance of underlying storage devices is often better for large block reads and writes. Read-ahead means reading large blocks of data from the underlying device in advance and caching them, so that application requests can be responded to through the cache. IO merging is for write requests. The write requests are not persisted to the back-end device immediately, but are cached and divided into large chunks of IO before writing.

Since the memory capacity is much smaller than the hard disk capacity, the page cache naturally cannot cache all hard disk data. In this way, only a subset of the file system data can be stored in the cache. When users continue to write data, they will face a situation where the cache is full. At this time, it involves the problem of how to flush the cache data to the disk and then store new data.

The process of flushing the cache to disk and storing new data is called cache replacement. There are many algorithms for cache replacement, each used to solve different problems. Next we introduce several common cache replacement algorithms.

LRU algorithm, the full name of LRU is Least Recently Used, which is the least recently used. This algorithm is based on the principle of temporal locality, that is, if a piece of data has been used recently, there is a high probability that it will be used again in the future. Therefore, the algorithm will release caches that have not been used recently.

The LRU algorithm is usually implemented using a linked list. The cache that has just been used will be inserted into the head of the table, while the data that has not been used often will be slowly squeezed to the end of the linked list. In order to understand the principle of LRU more clearly, we will illustrate it with the following figure.​

picture

In this example, we take full hits as an example to introduce. Assume that there are 6 data blocks in the cache, as shown in the first line of the figure. The number in the box represents the number of the data block. Assume that the first access (can be read or written) is data block No. 3. Since it has been accessed, it is moved to the head of the linked list.

The second access is to the data block No. 4. According to the same principle, the data block is also moved to the head of the linked list. The details are shown in line 2 of the above figure.

By analogy, after 4 rounds of access, the accessed data are moved forward, while the unaccessed data blocks (such as 1 and 2) are slowly squeezed to the back of the linked list. This indicates to a certain extent that the possibility of these two data blocks being accessed later is relatively small.

If it is a full hit, there will be no cache replacement. The actual situation is that the cache will often be insufficient, and the data in it needs to be released (depending on the situation, whether it needs to be flushed to disk) to store new data. At this time, the LRU algorithm comes in handy. This algorithm uses the tail data block to store new data and then puts it at the head of the linked list, as shown in the figure below. If there is dirty data in this data block, it needs to be flushed to the disk, otherwise it can be released directly.​

picture

The principle and implementation of the LRU algorithm are relatively simple, but its uses are very wide. However, the LRU algorithm has a shortcoming, that is, when a large amount of continuous data is suddenly written, all cache blocks will be replaced, causing all previous cache usage statistics to become invalid. This phenomenon is called cache pollution. In order to solve the cache pollution problem, there are many improved LRU algorithms, among which the more common ones are LRU-K, 2Q and LIRS.

LFU algorithm, the full name of LFU is Least Frequently Used, which is the least frequently used recently. This algorithm decides which cache block to release based on the frequency of data access. The cache block with the lowest access frequency will be released first.

As shown below is a schematic diagram of the LFU algorithm. Line 1 is the original state, and the number in the box represents the number of times the cache block has been accessed. The addition of new data and the elimination of cache blocks are performed from the tail. Assume that a certain piece of data (dotted box) has been accessed 4 times, and the number of accesses has changed from 12 to 16, so it needs to be moved to a new location, which is what line 2 in the figure looks like.

picture

This book uses a linked list as an example to illustrate the principle of LFU for ease of understanding, but it will never be implemented using a linked list during project implementation. Because a new location needs to be found when the number of accesses to a data block changes, the linked list search operation is very time-consuming. In order to achieve fast search, a search tree is generally used.​

LFU also has its shortcomings. If a data block was accessed frequently in a certain period of time long ago and is no longer accessed in the future, the data will remain in the cache. However, since the data will no longer be accessed, the effective capacity of the cache is reduced. In other words, the LFU algorithm does not consider the recent situation.

This article mainly introduces two very basic replacement algorithms such as LRU and LFU. In addition to the above algorithms, there are many replacement algorithms, most of which are based on the theory of LRU and LFU, such as 2Q, MQ, LRFU, TinyLFU and ARC, etc. Due to space limitations, this book will not go into details. You can read the relevant papers by yourself.

There is also a certain algorithm for data pre-reading. The pre-reading algorithm reads data from disk to cache in advance by identifying IO patterns. In this way, when the application reads data, it can read the data directly from the cache, thereby greatly improving the performance of reading data.

The most important thing in the pre-reading algorithm is the trigger condition, that is, under what circumstances the pre-reading operation is initiated. There are usually two situations that trigger read-ahead: one is when there are consecutive read requests from multiple addresses, which triggers a read-ahead operation; the other is when the application accesses a cache with a read-ahead mark. Here, the cache of the read-ahead mark is a mark made on the cache page when the read-ahead operation is completed. When the application reads the cache with this mark, the next read-ahead will be triggered, thereby omitting the identification of the IO mode.

picture

In order to explain the logic of pre-reading more clearly, we will introduce the entire process through the picture above. When the file system recognizes that the IO mode requires pre-reading, it will read out an additional part of the content (called synchronous pre-reading), as shown in time 1 (first line). At the same time, for synchronous read-ahead data, the file system will mark one of the blocks. The purpose of this mark is to trigger the next pre-read as early as possible before the end of the cache.​

At the second point in time, when the application continues to read data, because the marked cache block is read, the next pre-read is triggered at the same time. At this time, the data will be read from the disk in one step, and you can see from the figure that the cache increases.

At the next time points 3 and 4, the application can read data directly from the cache. Since no marked cache block has been read, the next read-ahead will not be triggered. At time point 5, because there is a pre-read mark, the pre-read process will be triggered again.

It can be seen from the above analysis that due to the pre-reading feature, the data is read into the cache in advance. Applications can read data directly from the cache without accessing the disk, so the overall access performance will be greatly improved.

The above is the detailed content of A brief analysis of the Linux file system (File System) architecture. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Fix event ID 55, 50, 98, 140 disk error in event viewer Fix event ID 55, 50, 98, 140 disk error in event viewer Mar 19, 2024 am 09:43 AM

If you find event ID 55, 50, 140 or 98 in the Event Viewer of Windows 11/10, or encounter an error that the disk file system structure is damaged and cannot be used, please follow the guide below to resolve the issue. What does Event 55, File system structure on disk corrupted and unusable mean? At session 55, the file system structure on the Ntfs disk is corrupted and unusable. Please run the chkMSK utility on the volume. When NTFS is unable to write data to the transaction log, an error with event ID 55 is triggered, which will cause NTFS to fail to complete the operation unable to write the transaction data. This error usually occurs when the file system is corrupted, possibly due to the presence of bad sectors on the disk or the file system's inadequacy of the disk subsystem.

Complete guide to uninstalling Kali Linux software to solve system stability problems Complete guide to uninstalling Kali Linux software to solve system stability problems Mar 23, 2024 am 10:50 AM

This study provides a comprehensive and in-depth analysis of software uninstallation problems that may arise during the penetration testing and security audit process of KaliLinux, and contributes solutions to ensure system stability and reliability. 1. Understand the installation method of the software. Before uninstalling the software from kalilinux, it is a crucial step to first determine its installation path. Then, the appropriate offloading solution is selected accordingly based on the selected path. Common installation methods include apt-get, dpkg, source code compilation and other forms. Each strategy has its own characteristics and corresponding offloading measures. 2. Use the apt-get command to uninstall software. In the KaliLinux system, the apt-get functional component is widely used to execute software packages efficiently and conveniently.

A complete guide to installing the domestic operating system Kirin Linux, completed in 15 minutes A complete guide to installing the domestic operating system Kirin Linux, completed in 15 minutes Mar 21, 2024 pm 02:36 PM

Recently, the domestic operating system Kirin Linux has attracted much attention. As a senior computer engineer, I have a strong interest in technological innovation, so I have personally experienced the installation process of this system, and now I will share my experience with you. Before executing the installation procedure, I was fully prepared for the relevant steps. The first task is to download and copy the latest Kirin Linux operating system image to a USB flash drive; secondly, for 64-bit Linux, ensure that important data in personal devices have been backed up to deal with potential installation problems; finally, shut down the computer and insert the USB flash drive. After entering the installation interface and restarting the computer, press the F12 function key promptly, enter the system boot menu and select the USB priority boot option. With a beautiful and simple startup screen appearing in front of you

puppylinux installation usb disk puppylinux installation usb disk Mar 18, 2024 pm 06:31 PM

In fact, after a computer is used for a long period of time, the overall performance will show a downward trend, and the adaptability to the Windows system will continue to decline. In addition to the reasons of the computer itself, the Windows system continues to be enhanced and expanded, and the hardware requirements are also getting higher and higher. Therefore, it is not surprising that old computers experience lag after installing Windows system. Previously, many friends were asking in the background about system lags, what to do with old computers? If you find that installing the new Windows 10 system on your old computer causes lags and operational problems, it may be a good choice to consider switching to Linux. Dabaicai has compiled 5 micro-Linux systems, which are suitable for old computers and can effectively reduce CPU usage and make your

How to handle file system error 2147416359 in WIN10 How to handle file system error 2147416359 in WIN10 Mar 27, 2024 am 11:31 AM

1. Press win+r to enter the run window, enter [services.msc] and press Enter. 2. In the service window, find [windows license manager service] and double-click to open it. 3. In the interface, change the startup type to [Automatic], and then click [Apply → OK]. 4. Complete the above settings and restart the computer.

What is the architecture and working principle of Spring Data JPA? What is the architecture and working principle of Spring Data JPA? Apr 17, 2024 pm 02:48 PM

SpringDataJPA is based on the JPA architecture and interacts with the database through mapping, ORM and transaction management. Its repository provides CRUD operations, and derived queries simplify database access. Additionally, it uses lazy loading to only retrieve data when necessary, thus improving performance.

How to solve the problem of garbled characters displayed on the Linux command line How to solve the problem of garbled characters displayed on the Linux command line Mar 21, 2024 am 08:30 AM

Methods to solve the problem of garbled characters displayed on the Linux command line. In the Linux operating system, sometimes we will encounter garbled characters displayed when using the command line interface, which will affect our normal viewing and understanding of the command output results or file contents. The causes of garbled characters may be due to incorrect system character set settings, terminal software not supporting the display of specific character sets, inconsistent file encoding formats, etc. This article will introduce some methods to solve the problem of garbled characters displayed on the Linux command line, and provide specific code examples to help readers solve similar problems.

Linux System Administrator Reveals: A Complete Guide to Analysis of Red Hat Linux Versions Linux System Administrator Reveals: A Complete Guide to Analysis of Red Hat Linux Versions Mar 29, 2024 am 09:16 AM

As a senior Linux system administrator, I already have a deep knowledge base and unique perspective on the analysis, diagnosis and treatment of RedHat version of Linux systems. This article will provide an in-depth analysis of all aspects of the RedHat version of the Linux system, including identifying its version characteristics, decoding the version number, and the actual steps for transmitting test version updates, etc., in order to help you fully grasp and efficiently utilize the features of the RedHat operating system. 1. Understand RedHat One of the Internet companies with the highest market value in the United States, RedHat has won a leading position in the global software market through its operating system products developed under the framework of open source technology. Its Linux distribution RedHat EnterpriseLinux (referred to as

See all articles