如何以编程方式判断Word文档是否损坏?

发布于 2024-07-21 06:58:10 字数 287 浏览 0 评论 0原文

我有一个小的 C# 应用程序,可以与 word 进行互操作,将一堆 word .doc 文件转换为文本文件,并且在大多数情况下,这都可以正常工作。

但是,如果文档已损坏,则 word 无法打开该文件并弹出一个对话框,这意味着我无法完全自动化此转换过程 - 必须有人监视对话框。

有没有一种方法可以在不打开单词 .doc 的情况下测试它是否已损坏? 也许通过文字互操作或通过第三方工具。

我的一个想法是生成一个执行转换的线程,并在进程打开时间超过 n 秒时终止它,但我想知道是否有一种更简单的方法?

I've got a little C# application that interops with word converting a bunch of word .doc files into textfiles and for the most part this works fine.

However, if the document is currupt then word cannot open the file and a dialog box pops up, which means that I cannot fully automate this conversion process - someone has to watch for the dialogs.

Is there a way to test if a word .doc is currupt, without opening it? Perhaps through word interop or maybe through a 3rd party tool.

One idea I've had is to spawn a thread that does the conversion and kill it if the process is open for longer than n seconds, but I was wondering if there was a simpler way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

芯好空 2024-07-28 06:58:10

确定 Word 是否认为该文件已损坏的唯一可靠方法是让 Word 打开它:-)。 我不认为任何第 3 方应用程序在这方面都是 100% 可靠的 - 毕竟,该文档实际上可能没有已损坏,但如果 Word 认为它已损坏,那对您没有帮助。 但是,显然您可以检测到某些情况,例如文件大小为零等。

我没有遇到很多(任何?)损坏的文档,所以我确实想知道您所看到的损坏是否可能遵循您可以检测到的模式? 例如,这些文档是否是从某处下载的并且通常会丢失文件的后半部分或其他内容?

无论如何,损坏的文件并不是 Word 弹出对话框的唯一原因。 其他原因包括:

  • 文件受密码保护
  • 文件包含其他文件的链接
  • 文件包含宏(它们本身可能会弹出对话框,或者可能导致出现安全警告对话框)
  • 等。

您可以使用以下方法规避其中一些原因: Application.DisplayAlerts等但不是全部(特别是安全警告)。

我在使用第二个线程检测 Office 拥有的对话框并(对于它识别的对话框)按下适当的按钮方面取得了一些成功。 它并不优雅,但确实有效。 是的,如果执行某些操作也花费太长时间,我的第二个线程也会终止应用程序。

The only sure-fire way to determine whether Word will think that the file is corrupt is to get Word to open it :-). I don't think any 3rd-party application would be 100% reliable in this regard - after all, the document might in fact not be corrupt, but that doesn't help you if Word thinks that it is. However, clearly there are some situations you could detect, such as the file being zero-sized or suchlike.

I don't come across many (any?) corrupt documents, so I do wonder if the corruption you're seeing might follow a pattern that you can detect? For example, are these documents downloaded from somewhere and usually missing the latter part of the file or something?

In any case, a corrupt file is not the only reason that Word might pop up a dialog box. Other reasons include:

  • the file is password-protected
  • the file contains links to other files
  • the file contains macros (which may themselves pop up dialog boxes, or which may cause the security warning dialog to appear)
  • etc.

You can circumvent some of these using Application.DisplayAlerts, etc. but not all (especially the security warning).

I've had some success with using a 2nd thread that detects dialogs owned by Office and (for those that it recognizes) presses an appropriate button. It's hardly elegant, but it does work. And yes, my 2nd thread will also terminate the application if it takes too long to perform certain operations too.

岛歌少女 2024-07-28 06:58:10

根据应用程序的性质,如果它是没有 UI 交互的服务器端应用程序,则使用 Office 自动化可能会出现问题。 (请参阅此处的链接:http://support.microsoft.com/kb/257757

如果是Office 2007+,最好的方法是使用OpenXML。 如果是较旧的文件,则可能会使用一些第三方工具,例如 aspose API

Depending on the nature of your application, if it's a server side application without UI interaction, using Office automation may have issues. (see link here: http://support.microsoft.com/kb/257757)

If it's Office 2007+, the best way is to use OpenXML. If it's older files, then some 3rd party tools may be used, for example, aspose API

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文