LZMA 压缩设置详细信息

发布于 2024-09-06 02:48:25 字数 248 浏览 6 评论 0原文

我真的需要知道每个 lzma 参数（mf，fb，lp，...）的含义。我在互联网上找不到任何好的文档。我需要这个算法的详细信息。最详细的是： http://www.bugaco.com/7zip/MANUAL/switches/method.htm 我将不胜感激任何帮助。

最好的祝愿，沙迪。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

狂之美人 2024-09-13 02:48:25

根据维基百科 似乎不存在压缩格式的完整自然语言规范。但是配置设置是指定的。

在使用 LZMA SDK 期间，我发现了以下压缩设置 CLzmaEncProps 和 CLzma2EncProps 结构类型：

LZMA 选项：

level

描述：压缩级别。
范围：[0;9]。
默认值：5。

dictSize

描述：字典大小。
范围：32 位 版本为 [1<<12;1<<27]，64 位 版本为 [1<<12;1<<30]代码>版本。
默认值：1<<24。

lc

描述：用作文字编码上下文的前一个字节的高位位数。
范围[0;8]。
默认值：3
有时 lc = 4 可为大文件提供增益。

lp

描述：要包含在literal_pos_state 中的字典位置的低位位数。
范围：[0;4]。
默认值：0。
适用于周期等于 2^value（其中 lp=value）的周期性数据。例如，对于 32 位（4 字节）周期性数据，您可以使用 lp=2。如果更改 lp 开关，通常最好设置 lc=0。

pb

描述：pb 是要包含在 pos_state 中的字典位置的低位位数。
范围：[0;4]。
默认值：2。
它适用于周期等于 2^value（其中 lp=value）的周期性数据。

algo

描述：设置压缩模式。
选项：0 = 快速，1 = 正常。
默认值：1。

fb

说明：设置 Deflate/Deflate64 编码器的快速字节数。
范围：[5;255]。
默认值：128。
通常，较大的数字会提供更好的压缩比和较慢的压缩过程。大的快速字节参数可以显着提高包含长的相同字节序列的文件的压缩率。

btMode

描述：设置 LZMA 的匹配查找器。
选项：0 = hashChain 模式，1 = binTree 模式。
默认值：1。
默认方法是 bt4。 hc* 组的算法无法提供良好的压缩比，但它们与快速模式结合使用时通常运行得相当快。

numHashBytes

描述：哈希字节数。请参阅此处的 mf={MF_ID} 部分了解详情。
选项：2、3 或 4。
默认值：4。

mc

说明：设置匹配查找器的循环（遍）数。
范围：[1；1＜＜30]。
默认值：32。
如果指定 mc = 0，LZMA 将使用默认值。通常，较大的数字会提供更好的压缩比和更慢的压缩过程。例如，mf=HC4和mc=10000可以提供与mf=BT4几乎相同的压缩比。

writeEndMark

描述：写入或不写入结束标记的选项。
选项：0 - 不写入 EOPM，1 - 写入 EOPM。
默认值：0。

numThreads

描述：线程数。
选项：1 或 2
默认值：2

LZMA2 选项：

LZMA2 是 LZMA 的修改版本。与 LZMA 相比，它具有以下优势：

与无法压缩的数据相比，数据的压缩率更高。 LZMA2
可以以未压缩的形式存储此类数据块。还有它
更快地解压缩此类数据。
更好的多线程支持。如果您压缩大文件，LZMA2 可以将该文件拆分为块并在多个线程中压缩这些块。

注意： LZMA2 也支持所有 LZMA 参数，但 lp + lc 不能大于 4< /代码>。

blockSize

描述：设置块大小。
默认值：dictSize * 4。

numBlockThreads

描述：设置每个块（块）的线程数。

numTotalThreads

描述：LZMA2 可以使用的最大线程数。

注意： LZMA2 使用：x1 和 x3 模式下每个块 1 个线程； x5、x7 和 x9 模式下每个块有 2 个线程。如果 LZMA2 设置为仅使用一个块所需的线程数，则它不会将流拆分为块。因此，不同数量的线程可以获得不同的压缩比。

我认为，为了获得有关该主题的更多信息，您必须以更深入的方式研究LZMA。互联网上关于它的例子很少，而且文档也很不完整。

LZMA Options:

level

Description: The compression level.
Range: [0;9].
Default: 5.

dictSize

Description: The dictionary size.
Range: [1<<12;1<<27] for 32-bit version or [1<<12;1<<30] for 64-bit version.
Default: 1<<24.

lc

Description: The number of high bits of the previous byte to use as a context for literal encoding.
Range [0;8].
Default: 3
Sometimes lc = 4 gives gain for big files.

lp

Description: The number of low bits of the dictionary position to include in literal_pos_state.
Range: [0;4].
Default: 0.
It is intended for periodical data when period is equal 2^value (where lp=value). For example, for 32-bit (4 bytes) periodical data you can use lp=2. Often it's better to set lc=0, if you change lp switch.

pb

Description: pb is the number of low bits of the dictionary position to include in pos_state.
Range: [0;4].
Default: 2.
It is intended for periodical data when period is equal 2^value (where lp=value).

algo

Description: Sets compression mode.
Options: 0 = fast, 1 = normal.
Default: 1.

fb

Description: Sets the number of fast bytes for the Deflate/Deflate64 encoder.
Range: [5;255].
Default: 128.
Usually, a big number gives a little bit better compression ratio and a slower compression process. A large fast bytes parameter can significantly increase the compression ratio for files which contain long identical sequences of bytes.

btMode

Description: Sets Match Finder for LZMA.
Options: 0 = hashChain mode, 1 = binTree mode.
Default: 1.
Default method is bt4. Algorithms from hc* group don't provide a good compression ratio, but they often work pretty fast in combination with fast mode.

numHashBytes

Description: Number of hash bytes. See mf={MF_ID} section here for details.
Options: 2, 3 or 4.
Default: 4.

mc

Description: Sets number of cycles (passes) for match finder.
Range: [1;1<<30].
Default: 32.
If you specify mc = 0, LZMA will use default value. Usually, a big number gives a little bit better compression ratio and slower compression process. For example, mf=HC4 and mc=10000 can provide almost the same compression ratio as mf=BT4.

writeEndMark

Description: Option for writing or not writing the end mark.
Options: 0 - do not write EOPM, 1 - write EOPM.
Default: 0.

numThreads

Description: Number of threads.
Options: 1 or 2
Default: 2

LZMA2 Options:

LZMA2 is modified version of LZMA. It provides the following advantages over LZMA:

Better compression ratio for data than can't be compressed. LZMA2
can store such blocks of data in uncompressed form. Also it
decompresses such data faster.
Better multithreading support. If you compress big file, LZMA2 can split that file to chunks and compress these chunks in multiple threads.

Note: LZMA2 also supports all LZMA parameters, but lp + lc cannot be larger than 4.

blockSize

Description: Sets chunk size.
Default: dictSize * 4.

numBlockThreads

Description: Set the number of threads per chunk(block).

numTotalThreads

Description: The maximum number of threads LZMA2 can use.

Note: LZMA2 uses: 1 thread for each chunk in x1 and x3 modes; and 2 threads for each chunk in x5, x7 and x9 modes. If LZMA2 is set to use only such number of threads required for one chunk, it doesn't split stream to chunks. So you can get different compression ratio for different number of threads.

I think that in order to get more information on this subject you have to study in a more profound way the LZMA. There are very few examples on the internet about it and the documentation is quite incomplete.