Home Backend Development Python Tutorial Basic method of decompressing file formats in python

Basic method of decompressing file formats in python

Jun 14, 2019 pm 02:29 PM
python Unzip

Python library that handles multiple compression package formats: patool. If you only use basic decompression, packaging and other operations, and don't want to learn more about the python libraries corresponding to various compression formats, patool should be a good choice.

Related recommendations: "python video"

Basic method of decompressing file formats in python The formats supported by the patool library include:

7z (. 7z, .cb7), ACE (.ace, .cba), ADF (.adf), ALZIP (.alz), APE (.ape), AR (.a), ARC (.arc), ARJ (.arj) , BZIP2 (.bz2), CAB (.cab), COMPRESS (.Z), CPIO (.cpio), DEB (.deb), DMS (.dms), FLAC (.flac), GZIP (.gz), ISO (.iso), LRZIP (.lrz), LZH (.lha, .lzh), LZIP (.lz), LZMA (.lzma), LZOP (.lzo), RPM (.rpm), RAR (.rar, . cbr), RZIP (.rz), SHN (.shn), TAR (.tar, .cbt), XZ (.xz), ZIP (.zip, .jar, .cbz) and ZOO (.zoo)

Basic usage of patool:

import patoolib
# 解压缩
patoolib.extract_archive("archive.zip", outdir="/tmp")
# 测试压缩包是否完整
patoolib.test_archive("dist.tar.gz", verbosity=1)
# 列出压缩包内的文件
patoolib.list_archive("package.deb")
# 创建压缩包
patoolib.create_archive("/path/to/myfiles.zip", ("file1.txt", "dir/"))
# 比较压缩包内文件的差异
patoolib.diff_archives("release1.0.tar.gz", "release2.0.zip")
# 搜索patoolib.search_archive("def urlopen", "python3.3.tar.gz")
# 修改压缩包的压缩格式
patoolib.repack_archive("linux-2.6.33.tar.gz", "linux-2.6.33.tar.bz2")
Copy after login

However, the normal operation of patool depends on other decompression software. For example, when I usually use patool to decompress files, it mainly calls my computer. For the two programs 7z and Rtools, if there is no software on the computer that can process the corresponding compressed files, an error will be reported:

patoolib.util.PatoolError: could not find an executable program to extract format rar; candidates are (rar,unrar,7z)
Copy after login

In addition, patool cannot process password-protected compressed files.
Libraries similar to patool include pyunpack and easy-extract: the pyunpack library relies on zipfile and patool, supports all compression formats supported by the two libraries, and needs to be installed in advance; the easy-extract library relies on the decompression software unrar, 7z, and par2. It needs to be installed in advance and also supports a variety of decompression formats.

Processing of common compression formats

If the corresponding compression software is not installed on the computer and you just want to use python for compression and decompression operations, you can use the other details below Introducing several common

zip formats

Python libraries that can handle zip format include python standard library zipfile, and third-party library python-archive, etc. The following are mainly introduced Let’s take a look at the basic usage of the zipfile library:
First create a ZipFile object:

# 导入ZipFile类
from zipfile import ZipFile
# ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, compresslevel=None)
# 默认模式是读取,该模式提供read(), readline(), readlines(), __iter__(), __next__()等方法
Copy after login

Decompress the file package. There are two decompression functions: extract() and extractall(). The former can decompress a single file and decompresses by default. to the current directory. The latter can decompress multiple files in batches and decompress all files by default. Both extract() and extractall() have parameter pwd and can process compressed packages with passwords.

with ZipFile('test.zip') as myzip:
    myzip.extract(member='1.txt',path='tmp')
    myzip.extractall(path='tmp',members=['1.txt','2.txt'],pwd='password')
Copy after login

Make compressed files: zipfile has four methods for compressing files: zipfile.ZIP_STORED (default), zipfile.ZIP_DEFLATED, zipfile.ZIP_BZIP2, zipfile.ZIP_LZMA

# 添加文件的mode有'w', 'a', 'x'
# 'w'表示覆盖或写入一个新文件;'a'表示在已有文件后追加;'x'表示新建文件并写入。
# 在三种mode下,如果未写入认识数据,则会生成空的ZIP文件。
with ZipFile('test.zip',mode='w') as myzip:    
    for file in ['1.txt', '2.txt']: # 需压缩的文件列表        
        myzip.write(file,compress_type=zipfile.ZIP_DEFLATED)
Copy after login

Compress the entire file Folder

# 方法一
def addToZip(zf, path, zippath):
    if os.path.isfile(path):        
        zf.write(path, zippath, zipfile.ZIP_DEFLATED)  # 以zlib压缩方法写入文件    
    elif os.path.isdir(path):        
        if zippath:            
            zf.write(path, zippath)        
        for nm in os.listdir(path):            
            addToZip(zf, os.path.join(path, nm), os.path.join(zippath, nm))
with zipfile.ZipFile('tmp4.zip', 'w') as zip_file:    
      addToZip(zip_file,'tmp','tmp')    
#方法二
class ZipFolder:    
    def toZip(self, file, zipfilename):        
        # 首先创建zipfile对象        
        with zipfile.ZipFile(zipfilename, 'w') as zip_file:            
            if os.path.isfile(file):  # 判断写入的是文件还是文件夹,是文件的话直接写入                
                zip_file.write(file)            
            else:  # 否则调用写入文件夹的函数assFolderToZip()                
                self.addFolderToZip(zip_file, file)    
    def addFolderToZip(self, zip_file, folder):        
        for file in os.listdir(folder):  # 依次遍历文件夹内的文件            
            full_path = os.path.join(folder, file)            
            if os.path.isfile(full_path): # 判断是文件还是文件夹,是文件的话直接写入                
                print('File added: ', str(full_path))                
                zip_file.write(full_path)            
            elif os.path.isdir(full_path):             
            # 如果是文件夹的话再次调用addFolderToZip函数,写入文件夹                
                print('Entering folder: ', str(full_path))                
                self.addFolderToZip(zip_file, full_path)
directory = 'tmp'   # 需压缩的文件目录
zipfilename = 'tmp1.zip'    #压缩后的文件名
ZipFolder().toZip(directory, zipfilename)
Copy after login

rar format

rar format does not have a corresponding python standard library and needs to rely on third-party libraries rarfile, python-unrar, pyUnRAR2, etc. The above libraries have something in common It depends on the support of RARLAB's UnRAR library. The following mainly introduces the rarfile library:

Installation and configuration
Installation command:

pip install rarfile
Copy after login

But the configuration is quite expensive some time. First you need to download and install UnRAR. Because my computer operating system is Windows, I just go to the RARLAB official website to download UnRarDLL and install it to the default path C:\Program Files (x86)\UnrarDLL.
Then add environment variables. First, add C:\Program Files (x86)\UnrarDLL\x64 (my system is 64-bit) to the Path variable in the system variables (right-click on computer>Properties>Advanced system settings >Advanced>Environment Variables), but the error is still reported after restarting PyCharm:

LookupError: Couldn't find path to unrar library.
Copy after login

Then try to create a new variable in the system variables, enter ?UNRAR_LIB_PATH for the variable name, and the variable value is ?C:\Program Files (x86) \UnrarDLL\x64\UnRAR64.dll (the variable value under 32-bit systems is C:\Program Files (x86)\UnrarDLL\UnRAR.dll). Restart PyCharm and the problem is solved.

Basic usage

The usage of rarfile library is very similar to zipfile, and also includes extract(), extractall(), namelist(), infolist(), getinfo (), open(), read(), printdir() and other functions, the main difference is that the RarFile object only supports reading mode and cannot write files.

# mode的值只能为'r'
class rarfile.RarFile(rarfile, mode='r', charset=None, info_callback=None, crc_check=True, errors='stop')
Copy after login

Using the rarfile library to decompress rar compressed packages is the same as using the zipfile library to decompress zip format compressed packages. Please refer to the usage of the zipfile library.

In addition, the installation, setup and use of the python-unrar library are very similar to the rarfile library, but the python-unrar library does not support the with statement. If you want to use the with statement, you can go to the python-unrar library installation directory. Add the following statement to the rarfile.py file:

def __enter__(self):
    """Open context."""    
    return self
def __exit__(self, typ, value, traceback):    
    """Exit context"""    
    self.close()
def close(self):    
    """Release open resources."""    
    pass
Copy after login

tar format

tar format is a common packaging file format under Unix systems and can be matched with different compression methods. Form different compressed file formats, such as: .tar.gz (.tgz), .tar.bz2 (.tbztb2), .tar.Z (.taz), .tar.lzma (.tlz), .tar.xz ( .txz) etc. The tar format corresponds to the python standard library tarfile. The supported formats include: tar, tar.gz, tar.bz2, tar.xz, .tar.lzma, etc.
Basic usage of the tarfile library:

Create tarfile objects

The tarfile library creates objects using tarfile.open() instead of tarfile.TarFile().

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
Copy after login

其中,mode可取的值比较多,主要包括'r', 'w', 'a', 'x'四种模式(在zipfile库的使用中简单介绍过),以及这四种模式与'gz', 'bz2', 'xz'三种压缩方法的组合模式,具体取值如下表所示:

模式 含义

'r'or'r:*' 自动解压并打开文件(推荐模式)

'r:' 只打开文件不解压

'r:gz' 采用gzip格式解压并打开文件

'r:bz2' 采用bz2格式解压并打开文件

'r:xz' 采用lzma格式解压并打开文件

'x'or'x:' 仅创建打包文件,不压缩

'x:gz' 采用gzip方式压缩并打包文件

'x:bz2' 采用bzip2方式压缩并打包文件

'x:xz' 采用lzma方式压缩并打包文件

'a'or'a:' 打开文件,并以不压缩的方式追加内容。如果文件不存在,则新建

'w'or'w:' 以不压缩的方式写入

'w:gz' 以gzip的方式压缩并写入

'w:bz2' 以bzip2的方式压缩并写入

'w:xz' 以lzma的方式压缩并写入

但是,不支持'a'与三种压缩方法的组合模式('a:gz', 'a:bz2'、'a:xz')

基本使用方法
解压缩至指定的目录

with tarfile.open("test.tar.gz") as tar:    
    tar.extractall(path='.')
Copy after login

解压符合某些条件的文件

# 解压后缀名为py的文件
def py_files(members):
    for tarinfo in members:        
        if os.path.splitext(tarinfo.name)[1] == ".py":            
            yield tarinfo
with tarfile.open("sample.tar.gz") as tar:    
    tar.extractall(members=py_files(tar))
Copy after login

创建不压缩的打包文件

with tarfile.open("sample.tar", "w") as tar:
    for name in ["foo", "bar", "quux"]:        
        tar.add(name)
Copy after login

创建压缩的打包文件

with tarfile.open("sample.tar", "w:gz") as tar:
    for name in ["foo", "bar", "quux"]:        
        tar.add(name)
Copy after login

压缩并打包整个文件夹,较之zipfile库简单得多,可使用add()函数进行添加

tar = tarfile.open('test.tar','w:gz')
for root ,dir,files in os.walk(os.getcwd()):      
  for file in files:          
      fullpath = os.path.join(root,file)          
      tar.add(fullpath)
Copy after login

其他压缩格式

Python原生的数据压缩打包的标准库还包括:bz2、gzip、zlib、lzma以及建立在zipfile和tarfile库基础上的shutil库,以后有机会再详细介绍。

The above is the detailed content of Basic method of decompressing file formats in python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1662
14
PHP Tutorial
1261
29
C# Tutorial
1234
24
PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP and Python: A Deep Dive into Their History PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Python vs. JavaScript: The Learning Curve and Ease of Use Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

How to run sublime code python How to run sublime code python Apr 16, 2025 am 08:48 AM

To run Python code in Sublime Text, you need to install the Python plug-in first, then create a .py file and write the code, and finally press Ctrl B to run the code, and the output will be displayed in the console.

Where to write code in vscode Where to write code in vscode Apr 15, 2025 pm 09:54 PM

Writing code in Visual Studio Code (VSCode) is simple and easy to use. Just install VSCode, create a project, select a language, create a file, write code, save and run it. The advantages of VSCode include cross-platform, free and open source, powerful features, rich extensions, and lightweight and fast.

Can visual studio code be used in python Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

How to run python with notepad How to run python with notepad Apr 16, 2025 pm 07:33 PM

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

See all articles