zip文件的创建和提取


=Start=

缘由:

简单记录一下 zip 文件的小知识点,方便以后有需要的时候参考。

正文:

参考解答:
  1. 什么是 ZIP 文件/格式?
ZIP文件格式是一种数据压缩和文档储存的文件格式,原名Deflate,发明者为菲尔·卡茨(Phil Katz),他于1989年1月公布了该格式的资料。ZIP通常使用后缀名“.zip”,它的MIME格式为application/zip。目前,ZIP格式属于几种主流的压缩格式之一,其竞争者包括RAR格式以及开放源码的7z格式。从性能上比较,RAR及7z格式较ZIP格式压缩率较高,而7-Zip由于提供了免费的压缩工具而逐渐在更多的领域得到应用。Microsoft从Windows ME操作系统开始内置对zip格式的支持,即使用户的电脑上没有安装解压缩软件,也能打开和制作zip格式的压缩文件,OS X和流行的Linux操作系统也对zip格式提供了类似的支持。因此如果在网络上传播和分发文件,zip格式往往是最常用的选择。
  1. 如何在macOS/Windows系统上操作 zip 文件?

GUI图形界面的操作比较简单直观,这里就不赘述,主要说一下在终端命令行上如何操作。

$ man zip

       zip is a compression and file packaging utility for Unix, VMS, MSDOS, OS/2, Windows 9x/NT/XP, Minix, Atari, Macintosh, Amiga, and Acorn RISC OS.  It is analogous to a combination of the Unix commands tar(1) and compress(1) and is compatible with PKZIP (Phil Katz's ZIP for MSDOS systems).

       A companion program (unzip(1L)) unpacks zip archives.  The zip and unzip(1L) programs can work with archives produced by PKZIP (supporting most PKZIP features up to PKZIP version 4.6), and PKZIP and PKUNZIP can work with archives produced by zip (with some exceptions, notably streamed archives, but recent changes in the zip file standard may facilitate better compatibility).  zip version 3.0 is compatible with PKZIP 2.04 and also supports the Zip64 extensions of PKZIP 4.5 which allow archives as well as files to exceed the previous 2 GB limit (4 GB in some cases).  zip also now supports bzip2 compression if the bzip2 library is included when zip is compiled.  Note that PKUNZIP 1.10 cannot extract files produced by PKZIP 2.04 or zip 3.0. You must use PKUNZIP 2.04g or unzip 5.0p1 (or later versions) to extract them.

The basic command format is:

       zip options archive inpath inpath ...

macOS

压缩
zip enc-zip-test.zip *.txt
zip -er enc-zip-test.zip *.txt #-e选项表示加密,-r选项表示递归
zip -er -P password11 enc-zip-test.zip *.txt #不推荐,因为此种方式不安全
解压
unzip -l filename.zip
unzip filename.zip
unzip -P password11 filename.zip #不推荐,因为此种方式不安全

Windows

压缩
PS C:\> Compress-Archive 1.txt test-ps-archive.zip #压缩


### tar.exe 从 Win10 开始有
### -c 选项用于说明此次为【创建】行为
### -a 选项用于进行zip压缩,若只有 cf 则表明此次为【tar格式的归档】而非【zip格式的压缩】
### -f 选项用于指定文件名
C:\>tar.exe -caf test-caf.zip 1.txt
提取
PS C:\> Expand-Archive .\test-ps-archive.zip #提取
PS C:\> Expand-Archive -Force .\test-ps-archive.zip #提取(避免文件已存在的报错)


C:\>tar.exe -tvf test-czf.zip
-rw-rw-rw-  0 0      0          11 8月 05 16:52 1.txt
C:\>tar.exe -xf test-czf.zip
C:\>
  1. 如何用 Python 操作 zip 文件?

说明:内置的 zipfile 模块当前还有一些限制,比如此模块目前不处理多磁盘ZIP文件。支持对ZIP归档文件中的加密文件进行解密(仅CRC32加密方式),但目前无法创建加密文件且解密极其缓慢,因为它是用原生Python而不是C实现的。

如果希望在Python中支持更多格式更快速的加解密,最好的办法是通过 subprocess 模块调用外部命令(比如7z等),但这个需要有额外的依赖,具体选择就要看实际的需求和具备的条件了。

一个简单的场景就是:一个文件比较敏感但内容又是必要的,一个临时对解决方案就是——将文件先用zip进行加密压缩后放到特定目录下,然后在代码里面用密码进行解压缩,读取文件内容之后,再把文件进行删除,此时内容已经到了内存里面,但是明文文件已经不在了,可以简单应急用。

# 引入 zipfile 模块
>>> import zipfile

# 打印/列出 zip 压缩包中包含的文件列表
>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.printdir()
...

>>> try:
...     with zipfile.ZipFile("sample.zip") as archive:
...         archive.printdir()
... except zipfile.BadZipFile as error:
...     print(error)
...

# 创建 zip 压缩包 (archive.write)
>>> filenames = ["hello.txt", "lorem.md", "realpython.md"]
>>> with zipfile.ZipFile("multiple_files.zip", mode="w") as archive:
...     for filename in filenames:
...         archive.write(filename)
...

>>> import pathlib
>>> directory = pathlib.Path("source_dir/")
>>> with zipfile.ZipFile("directory.zip", mode="w") as archive:
...    for file_path in directory.iterdir():
...        archive.write(file_path, arcname=file_path.name)
...

# 往 zip 压缩包中添加文件
>>> def append_member(zip_file, member):
...     with zipfile.ZipFile(zip_file, mode="a") as archive:
...         archive.write(member)
...

# 读取 zip 压缩包中指定文件的内容(archive.read / archive.open)
>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for line in archive.read("hello.txt").split(b"\n"):
...         print(line)
...

>>> with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
...     for line in archive.read("hello.txt", pwd=b"secret").split(b"\n"):
...         print(line)
...

>>> with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
...     archive.setpassword(b"secret")
...     for file in archive.namelist():
...         print(file)
...         print("-" * 20)
...         for line in archive.read(file).split(b"\n"):
...             print(line)
...

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     with archive.open("hello.txt", mode="r") as hello:
...         for line in hello:
...             print(line)
...

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     text = archive.read("hello.txt").decode(encoding="utf-8")
...

# 解压 zip 压缩包
>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.extractall("output_dir/")
...

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.extract("new_hello.txt", path="output_dir/")
...

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for file in archive.namelist():
...         if file.endswith(".md"):
...             archive.extract(file, "output_dir/")
...

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.extractall(pwd=bytes("secret",'utf-8'))
...
参考链接:

什么是 ZIP 文件?
https://experience.dropbox.com/zh-cn/resources/what-is-a-zip-file

ZIP格式
https://zh.wikipedia.org/zh-cn/ZIP%E6%A0%BC%E5%BC%8F

Python unzip AES-128 encrypted file – 如何用Python解压加密文件
https://stackoverflow.com/questions/15553150/python-unzip-aes-128-encrypted-file

zipfile — Work with ZIP archives
https://docs.python.org/3/library/zipfile.html

Python’s zipfile: Manipulate Your ZIP Files Efficiently
https://realpython.com/python-zipfile/#creating-populating-and-extracting-your-own-zip-files

Extract files from an encrpyted zip file with python3
https://gist.github.com/colmcoughlan/db1384156b8efe6676c9a6cc47756933

The best ways to password protect a ZIP file on Mac
https://setapp.com/how-to/password-protect-zip

Creating Password Protected Zip Files in Mac
https://www.canr.msu.edu/news/encrypted-zip-mac

Create .zip folder from the command line – (Windows)
https://superuser.com/questions/201371/create-zip-folder-from-the-command-line-windows

Tar and Curl Come to Windows!
https://techcommunity.microsoft.com/t5/containers/tar-and-curl-come-to-windows/ba-p/382409

Microsoft.PowerShell.Archive
https://docs.microsoft.com/zh-cn/powershell/module/microsoft.powershell.archive/?view=powershell-7.2

=END=


《 “zip文件的创建和提取” 》 有 4 条评论

  1. 如何在Windows系统上不借助任何外部工具进行压缩和解压缩zip文件?
    How can I compress (/ zip ) and uncompress (/ unzip ) files and folders with batch file without using any external tools?
    https://stackoverflow.com/questions/28043589/how-can-i-compress-zip-and-uncompress-unzip-files-and-folders-with-bat/
    `
    # TAR (only for the newest windows builds)
    //compress directory
    tar -cvf archive.tar c:\my_dir
    //extract to dir
    tar -xvf archive.tar.gz -C c:\data
    //compres to zip format
    tar -caf archive.zip c:\my_dir

    # PowerShell

    //zipping folder or file:
    powershell “Compress-Archive -Path “””C:\some_folder””” -DestinationPath “””zippedFolder.zip””””

    //unzipping folder:
    powershell “Expand-Archive -Path “””Draftv2.Zip””” -DestinationPath “””C:\Reference””””

    //zip directory
    powershell “Add-Type -Assembly “””System.IO.Compression.FileSystem””” ;[System.IO.Compression.ZipFile]::CreateFromDirectory(“””C:\some_dir”””, “””some.zip”””);”

    //unzip directory
    powershell “Add-Type -Assembly “System.IO.Compression.FileSystem” ;[System.IO.Compression.ZipFile]::ExtractToDirectory(“””yourfile.zip”””, “””c:\your\destination”””);”
    `

  2. How to create a zip archive of a directory?
    https://stackoverflow.com/questions/1855095/how-to-create-a-zip-archive-of-a-directory
    `
    # 如果希望对整个目录/文件生成压缩包,最简单的方法是用 shutil 这个模块,一方面是标准库自带,另一方面是用起来简单
    The easiest way is to use shutil.make_archive. It supports both zip and tar formats.

    import shutil
    shutil.make_archive(output_filename, ‘zip’, dir_name)

    If you need to do something more complicated than zipping the whole directory (such as skipping certain files), then you’ll need to dig into the zipfile module as others have suggested.

    # 如果想要稍微复杂点的,那就用 zipfile 模块吧
    `

    Python ZIP file with Example
    https://www.guru99.com/python-zip-file.html

  3. zip的-x选项表示排除某些特定文件不进行压缩打包
    `
    $ man zip
    -x files
    –exclude files
    Explicitly exclude the specified files, as in:

    zip -r foo foo -x \*.o

    which will include the contents of foo in foo.zip while excluding all the files that end in .o. The backslash avoids the shell filename substitution, so that the name matching is performed by zip at all directory
    levels.

    Also possible:

    zip -r foo foo [email protected]

    which will include the contents of foo in foo.zip while excluding all the files that match the patterns in the file exclude.lst.

    The long option forms of the above are

    zip -r foo foo –exclude \*.o

    and

    zip -r foo foo –exclude @exclude.lst

    Multiple patterns can be specified, as in:

    zip -r foo foo -x \*.o \*.c

    If there is no space between -x and the pattern, just one value is assumed (no list):

    zip -r foo foo -x\*.o

    See -i for more on include and exclude.
    `

  4. 在macOS系统上使用unzip命令解压zip压缩包文件时报错“Illegal byte sequence”,应该是和文件名的字符编码有关,解决办法就是使用open命令让macOS系统自动选择合适的应用进行处理
    “Illegal byte sequence” when using unzip on macOS High Sierra to extract a file with Cyrillic characters #315
    https://github.com/adamhathcock/sharpcompress/issues/315
    `
    $ unzip fileWithUnicodeCharacters.zip #error

    Use open, as in open fileWithUnicodeCharacters.zip. It looks like open will call an internal OS X program that has no problem open these type of .zip files.

    $ open fileWithUnicodeCharacters.zip #ok
    `

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注