处理 MS Word 文件中的文本的最简单方法

发布于 2024-08-14 04:04:04 字数 63 浏览 0 评论 0 原文

我需要从 C# 中的旧 MS word .doc 文件中提取文本。 完成这项工作最简单(或者最好)的方法是什么?

i need to extract text from an old MS word .doc file in C#.
What is the easiest (or else the best) way to get that job done?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

若无相欠,怎会相见 2024-08-21 04:04:04

首先,您需要添加到 MS Word 对象库中。转到项目 =>添加引用,选择COM选项卡,然后找到并选择“Microsoft Word 10.0对象库”。您计算机上的版本号可能有所不同。单击“确定”。

完成此操作后,您可以使用以下代码。它将打开一个 MS Word 文档,并在消息框中显示每个段落 -

// Read an MS Word Doc
private void ReadWordDoc()
{
    try
    {
        Word.ApplicationClass wordApp = new Word.ApplicationClass();

        // Define file path
        string fn = @"c:\test.doc";

        // Create objects for passing
        object oFile = fn;
        object oNull = System.Reflection.Missing.Value;
        object oReadOnly = true;

        // Open Document
        Word.Document Doc = wordApp.Documents.Open(ref oFile, ref oNull, 
                ref oReadOnly, ref oNull, ref oNull, ref oNull, ref oNull, 
                ref oNull, ref oNull, ref oNull, ref oNull, ref oNull, 
                ref oNull, ref oNull, ref oNull);

        // Read each paragraph and show         
        foreach (Word.Paragraph oPara in Doc.Paragraphs)                
            MessageBox.Show(oPara.Range.Text);

        // Quit Word
        wordApp.Quit(ref oNull, ref oNull, ref oNull);

    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }

}

First, you need to add in the MS Word object library. Go to Project => Add Reference, select the COM tab, then find and select "Microsoft Word 10.0 Object Library". The version number might be different on your computer. Click OK.

After you have done that, you can use the following code. It will open up an MS Word doc, and display each paragraph in a message box -

// Read an MS Word Doc
private void ReadWordDoc()
{
    try
    {
        Word.ApplicationClass wordApp = new Word.ApplicationClass();

        // Define file path
        string fn = @"c:\test.doc";

        // Create objects for passing
        object oFile = fn;
        object oNull = System.Reflection.Missing.Value;
        object oReadOnly = true;

        // Open Document
        Word.Document Doc = wordApp.Documents.Open(ref oFile, ref oNull, 
                ref oReadOnly, ref oNull, ref oNull, ref oNull, ref oNull, 
                ref oNull, ref oNull, ref oNull, ref oNull, ref oNull, 
                ref oNull, ref oNull, ref oNull);

        // Read each paragraph and show         
        foreach (Word.Paragraph oPara in Doc.Paragraphs)                
            MessageBox.Show(oPara.Range.Text);

        // Quit Word
        wordApp.Quit(ref oNull, ref oNull, ref oNull);

    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }

}
原谅过去的我 2024-08-21 04:04:04

根据您的需求和预算,您可能需要查看 Aspose.Words 库。它并不便宜,但可能会减少提取该文本所需的工作量。好处是您不需要在部署计算机上安装 MSOffice(恕我直言,如果您在服务器上运行它,这是强制性的)。

Depending on your needs and budget you might want to look at the Aspose.Words library. It's not cheap, but might cut down on the effort needed to extract that text. The bonus is that you don't need to have MSOffice installed on your deployment computer (which is mandatory IMHO if you are running this on a server).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文