在不启动 MSWord 的情况下读取 .doc 文件

发布于 2024-09-24 20:13:55 字数 969 浏览 6 评论 0 原文

我正在尝试打开 .doc 文件并读取其内容。但我找不到任何方法可以在不启动 MSWord 的情况下执行此操作。

现在我有以下代码:

Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
object nullObject = System.Reflection.Missing.Value;
object file = @"C:\doc.doc";
Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(ref file, ref nullObject, ref nullObject,
         ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject,
         ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject,
         ref nullObject);
doc.ActiveWindow.Selection.WholeStory();
doc.ActiveWindow.Selection.Copy();
IDataObject data = Clipboard.GetDataObject();
string text = data.GetData(DataFormats.Text).ToString();
doc.Close(ref nullObject, ref nullObject, ref nullObject);
app.Quit(ref nullObject, ref nullObject, ref nullObject);

但它启动了 MSWord,有什么解决方案可以在不启动的情况下执行此操作吗?

I'm trying to open .doc file and read its content. But i can't find any way how to do this without launching MSWord.

Now I have following code:

Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
object nullObject = System.Reflection.Missing.Value;
object file = @"C:\doc.doc";
Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(ref file, ref nullObject, ref nullObject,
         ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject,
         ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject,
         ref nullObject);
doc.ActiveWindow.Selection.WholeStory();
doc.ActiveWindow.Selection.Copy();
IDataObject data = Clipboard.GetDataObject();
string text = data.GetData(DataFormats.Text).ToString();
doc.Close(ref nullObject, ref nullObject, ref nullObject);
app.Quit(ref nullObject, ref nullObject, ref nullObject);

But it launches MSWord, any solution to do it without launching?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

腻橙味 2024-10-01 20:13:55

两种可能性:使用 Microsoft 规范为 .doc 格式编写自己的解析器,或使用现有的库(例如,来自 Aspose)。除非你有几年的空闲时间来完成这项任务,否则后者显然是正确的选择。

Two possibilities: either use Microsoft's spec to write your own parser for the .doc format, or use an existing library for the purpose (e.g., from Aspose). Unless you have a couple of spare years to spend on the task, the latter is clearly the correct choice.

甜味拾荒者 2024-10-01 20:13:55

上次我这样做时(通过 C++ 中的 COM),我记得应用程序接口中的“Visible”属性(true=visible)。

但是,在我看来,默认值是 false,因此您必须将其设置为 true 才能使 Word 出现。

无论用户是否可以看到 Word,您仍然会在任务管理器中看到 winword.exe(或今天的名称)。我认为没有办法通过此界面访问 Word,而不启动 Word(无论是否在幕后)。

如果您根本不想启动 Word,则可能需要寻找其他解决方案。

Last time I did this (via COM from C++), I recall a 'Visible' property in the Application interface (true=visible).

However, it seems to me that the default was false, so you had to set it to true to make Word appear.

Regardless of whether or not the user can see Word, you will still see winword.exe (or whatever it's called today) in your task manager. I don't think there's a way to access Word through this interface without it launching Word (behind the scenes or not).

If you don't want Word to launch at all, you may have to find another solution.

若言繁花未落 2024-10-01 20:13:55

使用“添加引用”添加命名空间-->浏览-->Code7248.word_reader.dll

从给定的 URL 下载 dll:

sourceforge.net/p/word-reader/wiki/Home

(一个简单的 .NET 库,与 C# 的 .NET 2.0、3.0、3.5 和 4.0 兼容。目前它只能提取来自 .doc 或 .docx 文件的原始文本。)

示例代码位于 C# 的简单控制台中:

using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;


namespace testWordRead
{
    class Program
    {
        private void readFileContent(string path)
        {
            TextExtractor extractor = new TextExtractor(path);
            string text = extractor.ExtractText();
            Console.WriteLine(text);
        }
        static void Main(string[] args)
        {
            Program cs = new Program();
            string path = "D:\Test\testdoc1.docx";
            cs.readFileContent(path);
            Console.ReadLine();
        }
    }
}

它工作正常。

Add the Namespace using Add Reference-->Browse-->Code7248.word_reader.dll

Download dll from the given URL :

sourceforge.net/p/word-reader/wiki/Home

(A simple .NET Library compatible with .NET 2.0, 3.0, 3.5 and 4.0 for C#. It can currently extract only the raw text from a .doc or .docx file.)

The Sample Code is in simple Console in C#:

using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;


namespace testWordRead
{
    class Program
    {
        private void readFileContent(string path)
        {
            TextExtractor extractor = new TextExtractor(path);
            string text = extractor.ExtractText();
            Console.WriteLine(text);
        }
        static void Main(string[] args)
        {
            Program cs = new Program();
            string path = "D:\Test\testdoc1.docx";
            cs.readFileContent(path);
            Console.ReadLine();
        }
    }
}

It is working fine.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文