在不启动 MSWord 的情况下读取 .doc 文件
我正在尝试打开 .doc 文件并读取其内容。但我找不到任何方法可以在不启动 MSWord 的情况下执行此操作。
现在我有以下代码:
Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
object nullObject = System.Reflection.Missing.Value;
object file = @"C:\doc.doc";
Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(ref file, ref nullObject, ref nullObject,
ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject,
ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject,
ref nullObject);
doc.ActiveWindow.Selection.WholeStory();
doc.ActiveWindow.Selection.Copy();
IDataObject data = Clipboard.GetDataObject();
string text = data.GetData(DataFormats.Text).ToString();
doc.Close(ref nullObject, ref nullObject, ref nullObject);
app.Quit(ref nullObject, ref nullObject, ref nullObject);
但它启动了 MSWord,有什么解决方案可以在不启动的情况下执行此操作吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
两种可能性:使用 Microsoft 规范为 .doc 格式编写自己的解析器,或使用现有的库(例如,来自 Aspose)。除非你有几年的空闲时间来完成这项任务,否则后者显然是正确的选择。
Two possibilities: either use Microsoft's spec to write your own parser for the .doc format, or use an existing library for the purpose (e.g., from Aspose). Unless you have a couple of spare years to spend on the task, the latter is clearly the correct choice.
上次我这样做时(通过 C++ 中的 COM),我记得应用程序接口中的“Visible”属性(true=visible)。
但是,在我看来,默认值是 false,因此您必须将其设置为 true 才能使 Word 出现。
无论用户是否可以看到 Word,您仍然会在任务管理器中看到 winword.exe(或今天的名称)。我认为没有办法通过此界面访问 Word,而不启动 Word(无论是否在幕后)。
如果您根本不想启动 Word,则可能需要寻找其他解决方案。
Last time I did this (via COM from C++), I recall a 'Visible' property in the Application interface (true=visible).
However, it seems to me that the default was false, so you had to set it to true to make Word appear.
Regardless of whether or not the user can see Word, you will still see winword.exe (or whatever it's called today) in your task manager. I don't think there's a way to access Word through this interface without it launching Word (behind the scenes or not).
If you don't want Word to launch at all, you may have to find another solution.
使用“添加引用”添加命名空间-->浏览-->Code7248.word_reader.dll
从给定的 URL 下载 dll:
sourceforge.net/p/word-reader/wiki/Home
(一个简单的 .NET 库,与 C# 的 .NET 2.0、3.0、3.5 和 4.0 兼容。目前它只能提取来自 .doc 或 .docx 文件的原始文本。)
示例代码位于 C# 的简单控制台中:
它工作正常。
Add the Namespace using Add Reference-->Browse-->Code7248.word_reader.dll
Download dll from the given URL :
sourceforge.net/p/word-reader/wiki/Home
(A simple .NET Library compatible with .NET 2.0, 3.0, 3.5 and 4.0 for C#. It can currently extract only the raw text from a .doc or .docx file.)
The Sample Code is in simple Console in C#:
It is working fine.