MSWord批量重采样图像

发布于 2024-10-29 02:08:45 字数 569 浏览 8 评论 0原文

我有几千个文字文件,是我的一些同事整理的。他们不是很懂技术的人,只是拿着 10 兆像素的相机,将几张照片直接嵌入到 Word 文件中,而没有重新采样。通常,页面上的图像会被缩小到很小,例如大约 3 英寸 x 2 英寸。

我需要编写某种工具来按顺序浏览这些(每个约 300MB)word 文件并对图像进行下采样,然后保存 word 文件。

我们主要处理 .doc 文件,而不是 .docx。可能还有一些幻灯片文件。

我有几个可用的选择。我可以用 C# 编写一个程序,为用户提供一个漂亮的界面,允许他们在保存时指定 DPI 和 JPEG 质量。或者,我可以使用 VBA 宏来完成此操作,但是我可能需要编写 DLL 或使用第三方 DLL 来调整图像大小。

我已经完成了一些从 .xls 和 .xlsx 文件导入 C# 的 Excel 操作,这很容易,但我怀疑以格式看起来不变的方式将下采样图像写回 .doc 文件可能会很棘手。

我可以得到一些意见吗:是否有一些免费的库(免费用于商业用途)用于访问 .doc 文件,可以完成我需要它们做的事情?如果我用 VBA 编写它,除了下采样问题之外,我还会遇到其他障碍吗?最后,您对如何解决这个问题有其他建议吗?

I have a few thousand word files which some of my colleagues have put together. They're not very technical people, and they've just taken their 10 megapixel cameras and embedded a few photos directly into the word files without resampling them. Often the images are scaled down to be quite small on the pages, say 3" by 2" approx.

I need to write some sort of tool to sequentially go through these, each ~300MB, word files and downsample the images, then save the word file.

We're dealing predominantly with .doc files, rather than .docx. There may be some powerpoint files also.

I have a few options available to me. I can write a program in C# which gives the user a nice interface allowing them to specify the DPI and JPEG quality when saving. Alternatively, I can use a VBA macro to do it, however I will probably need to either write a DLL or use a 3rd party one for the image resizing.

I've done some Excel importing from .xls and .xlsx files into C# and it was a breeze, however I suspect that writing downsampled images back to .doc files in such a way that the formatting looks unchanged may be tricky.

Can I get some input: Are there some free libraries (free for commercial use) for access .doc files which can do what I need them to do? If I were to write it in VBA, aside from the downsampling problem - are there any other obstacles I would face? Lastly, do you have an alternate suggestion on how to tackle this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

嘦怹 2024-11-05 02:08:45

好吧,我在大约一周内没有得到任何答案或评论,所以我将用我在那段时间学到的东西来回答我自己的问题。我希望这对以后的其他人有所帮助。

正如我所提到的,我们正在处理数千个办公室(word 和 powerpoint)文件,其中包含全分辨率数码相机图像。这些文件最多可以是几百 MB,但最多应该是几百 KB 到几 MB。它给公司网络造成了负担,而且人们打开这些重要文档的速度也非常慢。

我最初所做的是使用 7-Zip 解压 .doc 文件。我使用隐藏的 System.Diagnostics.Process 中的命令行界面从 .doc 文件中提取“WordDocument”。

然后,我将逐字节读取 WordDocument,直到找到 JPEG SOI 标记:0xFF 0xD8,并读取直到 EOI 标记:0xFF 0xD9。我会将 WordDocument 的这一部分作为流读入 Image 中,并在那里调整其大小。然后,我会将图像以较小的分辨率/较小的质量保存回 WordDocument 流。我可以确认图像被正确读取,并且它们被正确插入到 WordDocument 中。我们最终得到的文件比开始时小得多。不幸的是,7-Zip 允许您从 .doc 文件中提取这些组件,但它似乎不允许您重新插入它。所以所有这些工作基本上都是白费的。我对此可能是错的,但我的版本(目前最新的)不允许我将文件添加到 .doc 包中。

接下来,我重新编写了该函数,以便它使用 MS Office 互操作库。我打开一个 Word.Application 和一个 Word.Document,运行 Document.Convert(),然后将其另存为 .docx 文件。很多时候这已经足够了,但有时我们最终得到的文件只是稍微小一点。通过检查 .docx 文件的 GZip 内容,该文档的创建者似乎使用了 Microsoft Photo Editor 3,它以某种方式向 docx 添加了大约几十 MB 的 OLE 信息。

这就是我要做的。我在上面概述了我尝试过的两种方法。
第一种是原始 .doc 编辑技术,只有当您能找到一种将 WordDocument 重新打包到 .doc 中的方法时,该技术才有效 - 我还没有使用 PowerPoint 文件对其进行测试,但我认为过程会类似。第二种方法的优点是提供 .docx 和 .pptx 文件,可以使用 zip 兼容的打包库打开这些文件,并且可以很容易地编辑/删除资源。不幸的是,这意味着需要在计算机上安装 Office,如果您没有相对较新版本的 Office,则 Document.Convert() 方法将引发异常。

我希望这对阅读本文的人有所帮助。

Okay, I haven't had any answers or comments in about a week so I'm going to answer my own question with what I've managed to learn in that time. I hope it will be beneficial for some other person later down the line.

As I mentioned, we are dealing with thousands of office (word and powerpoint) files which have full-resolution digital camera images in them. The files can be anywhere up to several hundred MB, where they should be a few hundred KB to a few MB at most. It is causing a burden on the company network and it is also very slow for people to open these crucial documents.

What I originally did was to unpackage the .doc files with 7-Zip. I used the command-line interface in a hidden System.Diagnostics.Process to extract "WordDocument" from the .doc file.

Then, I would read through WordDocument byte-by-byte until I find the JPEG SOI marker: 0xFF 0xD8, and read until the EOI marker: 0xFF 0xD9. I would read in that fraction of the WordDocument as a stream into an Image, and resize it there. I would then save the image back to the WordDocument stream with a smaller resolution/smaller quality. I can confirm that the images were being read in correctly, and that they were being inserted into WordDocument correctly. We ended up with files much, much smaller than we started with. Unfortunately, 7-Zip allows you to extract these components from .doc files, but it does not appear to let you re-insert it. So all of that work was basically for nothing. I may be wrong about this, but my version (the latest at the moment), will not let me add files to a .doc package.

Next, I re-wrote the function so that it uses the MS Office interop library. I open a Word.Application and a Word.Document, run Document.Convert() and then save it as a .docx file. A lot of the time this is sufficient, however sometimes we end up with a file only slightly smaller. Upon inspection of the GZip contents of the .docx files, it seems that the creator of the document has used Microsoft Photo Editor 3, which has somehow added about a few dozen MB worth of OLE information to the docx.

So that is where I'm up to. I have outlined two methods above which I have tried.
The first is a raw .doc editing technique which will only work if you can find a way to re-package WordDocument into the .doc - and I haven't tested it with PowerPoint files but I assume the process would be similar. The second method has the advantage of providing .docx and .pptx files which can be opened with a zip-compatible packaging library and the resources can be edited/deleted quite easily. Unfortunately, it means that Office needs to be installed on the machine and if you don't have a relatively new version of office then the Document.Convert() method will throw an exception.

I hope that helps anyone reading this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文