哪些文件压缩后大小不会减小

发布于 2024-07-26 22:46:38 字数 1436 浏览 2 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(16

夜巴黎 2024-08-02 22:46:38

文件压缩的​​工作原理是消除冗余。 因此,包含很少冗余的文件压缩效果很差或根本不压缩。

您最有可能遇到的没有冗余的文件类型是已经压缩的文件。 就 PDF 而言,具体来说就是主要由图像组成的 PDF,这些图像本身就是压缩图像格式(如 JPEG)。

File compression works by removing redundancy. Therefore, files that contain little redundancy compress badly or not at all.

The kind of files with no redundancy that you're most likely to encounter is files that have already been compressed. In the case of PDF, that would specifically be PDFs that consist mainly of images which are themselves in a compressed image format like JPEG.

め七分饶幸 2024-08-02 22:46:38

jpeg/gif/avi/mpeg/mp3 和已经压缩的文件在压缩后不会有太大变化。 您可能会看到文件大小略有减小。

jpeg/gif/avi/mpeg/mp3 and already compressed files wont change much after compression. You may see a small decrease in filesize.

蒲公英的约定 2024-08-02 22:46:38

压缩后的文件不会减少其大小。

Compressed files will not reduce their size after compression.

兰花执着 2024-08-02 22:46:38

五年后,我至少有一些真实的统计数据可以证明这一点。

我使用 PrinceXML 生成了 17439 个多页 pdf 文件,总计 4858 Mb。 zip -r archive pdf_folder 为我提供了 4542 Mb 的 archive.zip。 这是原始大小的93.5%,因此不值得为了节省空间而这样做。

Five years later, I have at least some real statistics to show of this.

I've generated 17439 multi-page pdf-files with PrinceXML that totals 4858 Mb. A zip -r archive pdf_folder gives me an archive.zip that is 4542 Mb. That's 93.5% of the original size, so not worth it to save space.

初见 2024-08-02 22:46:38

唯一无法压缩的文件是随机文件 - 真正的随机位,或压缩器输出的近似值。

然而,对于一般的任何算法,有许多文件不能被该算法压缩,但可以被另一种算法很好地压缩。

The only files that cannot be compressed are random ones - truly random bits, or as approximated by the output of a compressor.

However, for any algorithm in general, there are many files that cannot be compressed by it but can be compressed well by another algorithm.

等风也等你 2024-08-02 22:46:38

PDF 文件已经被压缩。 它们使用以下压缩算法:

  • LZW (Lempel-Ziv-Welch)
  • FLATE(ZIP,PDF 1.2 中)
  • JPEG 和 JPEG2000(PDF 版本 1.5 CCITT(传真标准,第 3 或 4 组)
  • JBIG2 压缩(PDF 版本 1.4) RLE (运行长度编码)

根据创建 PDF 的工具和版本,使用不同类型的加密,您可以使用更有效的算法进一步压缩它,通过将图像转换为低质量的 jpeg 来降低一些质量

。这里

http://www.verypdf.com/pdfinfoeditor/compression.htm

PDF files are already compressed. They use the following compression algorithms:

  • LZW (Lempel-Ziv-Welch)
  • FLATE (ZIP, in PDF 1.2)
  • JPEG and JPEG2000 (PDF version 1.5 CCITT (the facsimile standard, Group 3 or 4)
  • JBIG2 compression (PDF version 1.4) RLE (Run Length Encoding)

Depending on which tool created the PDF and version, different types of encryption are used. You can compress it further using a more efficient algorithm, loose some quality by converting images to low quality jpegs.

There is a great link on this here

http://www.verypdf.com/pdfinfoeditor/compression.htm

涫野音 2024-08-02 22:46:38

使用 CBC 模式下的 IDEA 或 DES 等良好算法加密的文件不再压缩,无论其原始内容如何。 这就是加密程序首先压缩然后才运行加密的原因。

Files encrypted with a good algorithm like IDEA or DES in CBC mode don't compress anymore regardless of their original content. That's why encryption programs first compress and only then run the encryption.

滿滿的愛 2024-08-02 22:46:38

通常,您无法压缩已经压缩的数据。 您甚至可能最终得到比输入更大的压缩大小。

Generally you cannot compress data that has already been compressed. You might even end up with a compressed size that is larger than the input.

路弥 2024-08-02 22:46:38

您可能也会在压缩加密文件时遇到困难,因为它们本质上是随机的并且(通常)很少有重复块。

You will probably have difficulty compressing encrypted files too as they are essentially random and will (typically) have few repeating blocks.

离旧人 2024-08-02 22:46:38

媒体文件往往不能很好地压缩。 JPEG 和 MPEG 不会压缩,但您可以压缩 .png 文件

Media files don't tend to compress well. JPEG and MPEG don't compress while you may be able to compress .png files

桜花祭 2024-08-02 22:46:38

已经压缩的文件通常无法进一步压缩。 例如 mp3、jpg、flac 等。
由于重新压缩的文件头,您甚至可以获得更大的文件。

File that are already compressed usually can't be compressed any further. For example mp3, jpg, flac, and so on.
You could even get files that are bigger because of the re-compressed file header.

我要还你自由 2024-08-02 22:46:38

实际上,这完全取决于所使用的算法。 当输入文件与该假设不匹配时,专门为使用常见英语单词中的字母频率而定制的算法将表现得相当差。

一般来说,PDF 包含已经压缩的图像等,因此不会进一步压缩。 如果基于 PDF 中包含的文本字符串有任何节省,您的算法可能只能勉强维持微薄?

Really, it all depends on the algorithm that is used. An algorithm that is specifically tailored to use the frequency of letters found in common English words will do fairly poorly when the input file does not match that assumption.

In general, PDFs contain images and such that are already compressed, so it will not compress much further. Your algorithm is probably only able to eke out meagre if any savings based on the text strings contained in the PDF?

好倦 2024-08-02 22:46:38

简单的答案:压缩文件(或者我们可以通过多次压缩将文件大小减小到 0:)。 许多文件格式已经应用压缩,您可能会发现在压缩电影、mp3、jpeg 等时文件大小缩小了不到 1%。

Simple answer: compressed files (or we could reduce file sizes to 0 by compressing multiple times :). Many file formats already apply compression and you might find that the file size shrinks by less then 1% when compressing movies, mp3s, jpegs, etc.

稀香 2024-08-02 22:46:38

您可以将所有 Office 2007 文件格式添加到(@waqasahmed)列表中:

由于 Office 2007 .docx 和 .xlsx(等)实际上是压缩的 .xml 文件,因此您也可能不会看到它们的大小有很大减小。

You can add all Office 2007 file formats to the list (of @waqasahmed):

Since the Office 2007 .docx and .xlsx (etc) are actually zipped .xml files, you also might not see a lot of size reduction in them either.

浅忆流年 2024-08-02 22:46:38
  1. 真正随机

  2. 其近似值,由加密强哈希函数或密码制成,例如:

    AES-CBC(任何输入)

    "".join(map(b2a_hex, [md5(str(i)) for i in range(...)]))

  1. Truly random

  2. Approximation thereof, made by cryptographically strong hash function or cipher, e.g.:

    AES-CBC(any input)

    "".join(map(b2a_hex, [md5(str(i)) for i in range(...)]))

悲欢浪云 2024-08-02 22:46:38

任何无损压缩算法,只要它使某些输入变小(正如名称压缩所暗示的那样),也会使其他一些输入变大。

否则,给定长度 L 的所有输入序列的集合可以映射到长度小于 L 的所有序列的(小得多)集合,并且这样做不会发生冲突(因为压缩必须是无损且可逆的),其中鸽洞原理排除了这种可能性。

因此,有无限的文件在压缩后不会减小其大小,而且,文件不需要是高熵文件:)

Any lossless compression algorithm, provided it makes some inputs smaller (as the name compression suggests), will also make some other inputs larger.

Otherwise, the set of all input sequences up to a given length L could be mapped to the (much) smaller set of all sequences of length less than L, and do so without collisions (because the compression must be lossless and reversible), which possibility the pigeonhole principle excludes.

So, there are infinite files which do NOT reduce its size after compression and, moreover, it's not required for a file to be an high entropy file :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文