如何减小嵌入图像的 RTF 的大小?

发布于 2024-08-03 22:47:39 字数 619 浏览 2 评论 0原文

我们有一些代码可以从 RTF 模板生成 RTF 文档。它基本上是在 RTF 文件中进行字符串搜索和特殊标记的替换。这可以通过网页访问。

通常,处理时间非常快。

但是,我们需要在模板中嵌入图像。我们一直使用 Word 的“插入/图片/来自文件...”功能将这些图像嵌入为 JPEG 图像。但我们发现生成的 RTF 文件大小在很大程度上取决于图像。

例如,我插入了一个 20k JPEG 徽标(基本上是带有一些文本的纯色背景)。 RTF 文件的大小从大约 390k(不带图像)增加到 510k(带图像)。

然后我们插入一个包含屏幕截图的JPEG,即图像包含文本、多种颜色等。JPEG 大约为150k。使用此图像,RTF 文件的大小从 390k 增加到 3.5MB。

因此,Word 用于将图像存储到 RTF 中的编码并不是线性执行的。我猜这取决于 JPEG 图像中的内容。

我需要将 RTF 模板的大小保持在最低限度,以尽量缩短文件处理时间。

  • 有谁知道如何最小化带有嵌入图像的 RTF 文件的大小吗?
  • 有什么方法可以控制Word使用的编码吗?我在任何地方都看不到任何选项。
  • 有谁知道Word/RTF 使用什么类型的二进制编码?

提前致谢。

We have some code which produces an RTF document from a RTF template. It is basically doing string search and replaces of special tags within the RTF file. This is accessible via a web page.

Typically, the processing time for this is really quick.

However, we need to embed an image within a template. We've been embedding these as JPEG images using Word's "Insert/Picture/From File..." functionality. But we've found that the resultant RTF file size is massively dependant upon the image.

For example, I've inserted a 20k JPEG logo (which is basically a solid background with some text). The RTF file increased in size from around 390k (without the image) to 510k (with the image).

Then we inserted a JPEG containing a screenshot, i.e. the image contains text, multiple colours, etc. The JPEG is around 150k. Using this image, the RTF file increased in size from 390k to 3.5MB.

So the encoding that Word uses for storing images into an RTF doesn't perform linearly. I'm guessing it is dependant upon what is in the JPEG image.

I need to keep the size of the RTF templates to a minimum to try and keep our file processing times to a minimum.

  • Does anyone have any ideas on how to minimize the size of the RTF files with embedded images?
  • Is there any way of controlling the encoding that Word uses? I can't see any options anywhere.
  • Does anyone know what type of binary encoding Word/RTF uses?

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

深者入戏 2024-08-10 22:47:39

这是最佳解决方案

http://support.microsoft.com/kb/224663

摘录:

症状

当您保存包含 EMF 的 Microsoft Word 文档时,
PNG、GIF 或 JPEG 图形作为不同的文件格式(例如,
Word 6.0/95 (.doc) 或 RTF 格式 (.rtf)),文件大小
文档可能会急剧增加。

例如,包含 JPEG 的 Microsoft Word 2000 文档
保存为 Word 2000 文档的图形的文件大小可能为
45,568 字节 (44.5KB)。但是,当您将此文件另存为 Word 6.0/95 时
(.doc) 或富文本格式 (.rtf),文件大小可能会增长到
1,289,728 字节 (1.22MB)。

原因

此功能是 Microsoft Word 设计的。如果一个
EMF、PNG、GIF 或 JPEG 图形插入到 Word 文档中,
保存文档时,图形的两个副本将保存在
文档。图形以适用的 EMF、PNG、GIF 或 JPEG 格式保存
格式并转换为 WMF(Windows 图元文件)格式。

解决方案

警告如果您使用
注册表编辑器不正确,可能会导致严重的问题
要求您重新安装操作系统。微软不能
保证您可以解决因使用Registry而产生的问题
编辑错误。使用注册表编辑器的风险由您自行承担。

为了防止 Word 在文档中保存图形的两个副本,
要减小文档的文件大小,请添加
ExportPictureWithMetafile=0 到 Microsoft Windows 的字符串值
注册表。

Here is the best solution

http://support.microsoft.com/kb/224663

Excerpt:

SYMPTOMS

When you save a Microsoft Word document that contains an EMF,
PNG, GIF, or JPEG graphic as a different file format (for example,
Word 6.0/95 (.doc) or Rich Text Format (.rtf)), the file size of the
document may dramatically increase.

For example, a Microsoft Word 2000 document that contains a JPEG
graphic that is saved as a Word 2000 document may have a file size of
45,568 bytes (44.5KB). However, when you save this file as Word 6.0/95
(.doc) or as Rich Text Format (.rtf), the file size may grow to
1,289,728 bytes (1.22MB).

CAUSE

This functionality is by design in Microsoft Word. If an
EMF, a PNG, a GIF, or a JPEG graphic is inserted into a Word document,
when the document is saved, two copies of the graphic are saved in the
document. Graphics are saved in the applicable EMF, PNG, GIF, or JPEG
format and are also converted to WMF (Windows Metafile) format.

RESOLUTION

Warning If you use
Registry Editor incorrectly, you may cause serious problems that may
require you to reinstall your operating system. Microsoft cannot
guarantee that you can solve problems that result from using Registry
Editor incorrectly. Use Registry Editor at your own risk.

To prevent Word from saving two copies of the graphic in the document,
and to reduce the file size of the document, add the
ExportPictureWithMetafile=0 string value to the Microsoft Windows
registry.

天煞孤星 2024-08-10 22:47:39

RTF 文件中的图像以未压缩的 WMF 形式存储。在 Mac 上,它是 macpict。减小文件大小的最佳方法是将图像链接到文档,而不是在文档中插入副本。代价是您必须将文件保存在一起。

编辑
压缩 RTF 是一种选择吗?使用 zip/rar,您将恢复文件大小,但显然首先必须解压缩。应该有可以进行rtf压缩的工具,但我从未使用过它们。

An image in an RTF file gets stored as a WMF, uncompressed. On mac, it it would be macpict. Your best bet to keep the file size down is to link the image to the document rather than insert a copy in the document. The trade-off is that you have to keep the files together.

EDIT
Is compressing the RTF an option? Using zip/rar, you'll get your file size back, but you'll have to uncompress, first obviously. There are supposed to be tools that will do rtf compression, but I have never used them.

意犹 2024-08-10 22:47:39

我们在工作中做过一个类似的项目。只是我们没有使用“插入/图片/来自文件...”功能。我们的模板有一个名为 [photos] 的标签,我想您自己的模板也有。当我们处理文档时,我们将标签替换为显示图像所需的 RTF 代码。我们将它们放在一个表格中,每行显示两个图像,加上顶部的一行作为标题。

因此,您可以在模板中放置一个标签[照片]。然后用 RTF 代码替换标签。您可以在网上找到这些代码的一些很好的参考。例如。 此处

现在,我的代码看起来像这样:

\par {\rtf1\ansi\deff0{\trowd\cellx8810 {标题}\intbl\qc\cell\row}{\trowd\cellx4405\cellx8810 {\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
十六进制字节数组形式的图像 }\intbl\cell{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\ piccropb-50\十六进制
您的其他图片 }\intbl\cell\row}

如果您将图像放入字节数组中,则可以使用 BitConverter.ToString(array) 来获取十六进制代码。只需要将破折号“-”替换为“”;

我们的文件占用的空间不到“普通”RTF 空间的 1/10。如果我们用Notepad++等编辑器打开文档的代码,我们可以看到RTF代码,但如果我们打开文档并将其另存为RTF(更改名称),它就会从1.5Mb变成50Mb!
我猜 DaveParillo 的回复证明了这一点:我只将每个图像写一次。

希望有帮助。
干杯伙计

We have done a similar project over at work. Only we're not using that "Insert/Picture/From File..." functionality. Our template has a tag named [photos], as I presume your own does also. When we process the document we replace the tag with the RTF codes needed to display images. We're putting them within a table and we're displaying two images on each row, plus a row on top for the title.

So, you might place a tag [photos] in your template. Then you replace the tag with the RTF Codes. You can find some good references to these codes on the web. For eg. here
.

Now, my code looks something like this:

\par {\rtf1\ansi\deff0{\trowd\cellx8810 {title}\intbl\qc\cell\row}{\trowd\cellx4405\cellx8810{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
Your image as an array of bytes in hexadecimal }\intbl\cell{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
Your other image }\intbl\cell\row}

if you get your image into a byte array, you may use BitConverter.ToString(array) to get your hex code. only you'll need to replace dashes "-" by "";

Our files will take up less than 1/10th of the space a "normal" RTF will. If we open the doc's code with an editor such as Notepad++, we can see the RTF codes, but if we open the document and save it as RTF (changing its name), it'll go from 1.5Mb to 50Mb!!
I'm guessing DaveParillo's reply justifies it: I'm only writing each image once.

Hope it helps.
Cheers mate

纵情客 2024-08-10 22:47:39

首先,请记住每个字节使用 2 个字符(两个字节)存储,这意味着增量至少是原始图片大小的两倍。

您需要的其他东西是 Word 和 Word Pad 插入相同图像的不同(风格或格式)以及其他字段(RTF 可以在没有它们的情况下显示)。

以下是一些用于在 RTF 中插入图像的脚本 (https://joseluisbz.wordpress.com/2011/06/22/script-de-clases-rtf-para-jsp-y-php/),以及一个使用示例(< a href="https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con-php-y-jsp-y-mostrarlos- en-rtf-usando-clases/" rel="nofollow">https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con -php-y-jsp-y-mostrarlos-en-rtf-usando-clases/)

现在,也许您需要将原始图像替换为另一个图像(http://joseluisbz.wordpress.com/2013/07/26/exploring-a-wmf-file-0x000900 /)。

Initially, keep in mind that each byte is stored using 2 characters (two bytes), this means that the increments at least is the double size of original picture.

Other things that you need is that Word and Word Pad insert different (flavor or format) of the same image plus other fields (that RTF can to be displayed without them).

Here are some scripts used to insert images in RTF (https://joseluisbz.wordpress.com/2011/06/22/script-de-clases-rtf-para-jsp-y-php/), and one example of use (https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con-php-y-jsp-y-mostrarlos-en-rtf-usando-clases/)

Now, maybe you will need replace the original Image with another (http://joseluisbz.wordpress.com/2013/07/26/exploring-a-wmf-file-0x000900/).

夜血缘 2024-08-10 22:47:39

Swartbees 的答案对我来说非常有效。我首先使用 GIMP Save as jpeg 功能将图像质量降低到“0”。在遵循上面 Swartbees 建议的微软解决方案后,我将图片重新插入到文件中,大小增加可以忽略不计,从 229k 到 279k(而不是 29000kb)。

谢谢你们的建议。

The Swartbees answer worked perfectly for me. I first reduced the image quality to "0" using G.I.M.P. Save as jpeg functionality. After following the microsoft solution suggested by Swartbees above I reinserted the picture into the file and the size increase was negligible 229k to 279k (as opposed to 29000kb).

Thanks for your suggestions guys.

羁拥 2024-08-10 22:47:39

是的,通过删除多余的字符。为此,您必须将它们插回您的流中。
例如,如果一行中有超过 20 个 f 字符,那么您可以在流中替换为 f[20]。这是一个开始。

-祝你好运。

Yes, by removing the redundant characters. And to do this you must insert them back into your stream.
For instance if you have over twenty f characters in one line, then you can replace with f[20] in your stream. It is a start.

-Best of luck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文