如何从 MS Word 中的行号获取文本

发布于 2025-01-03 12:42:09 字数 1168 浏览 5 评论 0原文

是否可以使用办公自动化从 MS Word 中的给定行号获取文本(行或句子)?我的意思是,如果我可以获得给定行号中的文本或作为该行一部分的句子本身,那就可以了。

我没有提供任何代码,因为我完全不知道如何使用办公自动化阅读 MS Word。我可以像这样打开文件:

var wordApp = new ApplicationClass();
wordApp.Visible = false;
object file = path;
object misValue= Type.Missing; 
Word.Document doc = wordApp.Documents.Open(ref file, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue);

//and rest of the code given I have a line number = 3 ?

编辑:为了澄清@Richard Marskell - Drackir 的疑问,虽然MS Word 中的文本是一长串字符串,但办公自动化仍然让我们知道行号。事实上,我从另一段代码中获取行号本身,如下所示:

Word.Revision rev = //SomeRevision
object lineNo = rev.Range.get_Information(Word.WdInformation.wdFirstCharacterLineNumber);

例如,Word 文件如下所示:

fix grammatical or spelling errors

clarify meaning without changing it correct minor mistakes add related resources or links
always respect the original author

这里有 4 行。

Is it possible to get text (line or sentence) from a given line number in MS Word using office automation? I mean its ok if I can get either the text in the given line number or the sentence(s) itself which is a part of that line.

I am not providing any code because I have absolutely no clue how an MS Word is read using office automation. I can go about opening the file like this:

var wordApp = new ApplicationClass();
wordApp.Visible = false;
object file = path;
object misValue= Type.Missing; 
Word.Document doc = wordApp.Documents.Open(ref file, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue,
                                           ref misValue, ref misValue, ref misValue);

//and rest of the code given I have a line number = 3 ?

Edit: To clarify @Richard Marskell - Drackir's doubt, though text in MS Word is a long chain of string, office automation does still let us know line number. In fact I get the line number itself from another piece of code, like this:

Word.Revision rev = //SomeRevision
object lineNo = rev.Range.get_Information(Word.WdInformation.wdFirstCharacterLineNumber);

For instance say the Word file looks like this:

fix grammatical or spelling errors

clarify meaning without changing it correct minor mistakes add related resources or links
always respect the original author

Here there are 4 lines.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

难忘№最初的完美 2025-01-10 12:42:09

幸运的是,经过一些史诗般的搜索,我找到了解决方案。

    object file = Path.GetDirectoryName(Application.ExecutablePath) + @"\Answer.doc";

    Word.Application wordObject = new Word.ApplicationClass();
    wordObject.Visible = false;

    object nullobject = Missing.Value;
    Word.Document docs = wordObject.Documents.Open
        (ref file, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject);

    String strLine;
    bool bolEOF = false;

    docs.Characters[1].Select();

    int index = 0;
    do
    {
        object unit = Word.WdUnits.wdLine;
        object count = 1;
        wordObject.Selection.MoveEnd(ref unit, ref count);

        strLine = wordObject.Selection.Text;
        richTextBox1.Text += ++index + " - " + strLine + "\r\n"; //for our understanding

        object direction = Word.WdCollapseDirection.wdCollapseEnd;
        wordObject.Selection.Collapse(ref direction);

        if (wordObject.Selection.Bookmarks.Exists(@"\EndOfDoc"))
            bolEOF = true;
    } while (!bolEOF);

    docs.Close(ref nullobject, ref nullobject, ref nullobject);
    wordObject.Quit(ref nullobject, ref nullobject, ref nullobject);
    docs = null;
    wordObject = null;

此处是代码背后的天才。请点击链接获取有关其工作原理的更多说明。

Fortunately after some epic searching I got a solution.

    object file = Path.GetDirectoryName(Application.ExecutablePath) + @"\Answer.doc";

    Word.Application wordObject = new Word.ApplicationClass();
    wordObject.Visible = false;

    object nullobject = Missing.Value;
    Word.Document docs = wordObject.Documents.Open
        (ref file, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject);

    String strLine;
    bool bolEOF = false;

    docs.Characters[1].Select();

    int index = 0;
    do
    {
        object unit = Word.WdUnits.wdLine;
        object count = 1;
        wordObject.Selection.MoveEnd(ref unit, ref count);

        strLine = wordObject.Selection.Text;
        richTextBox1.Text += ++index + " - " + strLine + "\r\n"; //for our understanding

        object direction = Word.WdCollapseDirection.wdCollapseEnd;
        wordObject.Selection.Collapse(ref direction);

        if (wordObject.Selection.Bookmarks.Exists(@"\EndOfDoc"))
            bolEOF = true;
    } while (!bolEOF);

    docs.Close(ref nullobject, ref nullobject, ref nullobject);
    wordObject.Quit(ref nullobject, ref nullobject, ref nullobject);
    docs = null;
    wordObject = null;

Here's the genius behind the code. Follow the link for some more explanation on how it works.

薄荷→糖丶微凉 2025-01-10 12:42:09

如果您想读取标准文本 .txt 文件,请使用此选项
您可以使用这里的一个调用来读取文件,

List<string> strmsWord = 
    new List<string>(File.ReadAllLines(yourFilePath+ YourwordDocName));

如果您想循环并查看返回的项目使用类似这样的内容

 foreach (string strLines in strmsWord )
 {
   Console.WriteLine(strLines);
 }     

,或者

我完全忘记了一些Word文档可能是二进制格式的,那么 所以看看这个和将内容读入 RichTextBox,从那里您可以获取所需的行号,也可以将其加载到单词之后的列表中。此链接将向您显示
从 Word 文档中读取
如果你想阅读文档一词的 XML 格式:
这里还有一个很好的结账链接
Word 文档的 ReadXML 格式

这是一个更简单的示例,将内容读取到剪贴板中
将 Word 加载到剪贴板

Use this if you want to read standard text .txt files
Here is something that you can use to read the files with one call

List<string> strmsWord = 
    new List<string>(File.ReadAllLines(yourFilePath+ YourwordDocName));

if you want to loop thru and see what the items that were returned use something like this

 foreach (string strLines in strmsWord )
 {
   Console.WriteLine(strLines);
 }     

or

I totally forgot about something Word docs are probably in binary format so look at this and read the contents into a RichTextBox and from there you could either get at the line number you want or load it into a list after words.. this link will show you
Reading from a Word Doc
if you want to read the XML Formatting of the word Document:
here is a good link as to checkout as well
ReadXML Format of a Word Document

This onne is an even easier example reads contents into the ClipBoard
Load Word into ClipBoard

滿滿的愛 2025-01-10 12:42:09
var word = new Word.Application();
object miss = Missing.Value;
object path = @"D:\viewstate.docx";
object readOnly = true;
var docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, 
                               ref miss, ref miss, ref miss, ref miss, ref miss, 
                               ref miss, ref miss, ref miss, ref miss, ref miss, 
                               ref miss, ref miss);
string totaltext = "";

object unit = Word.WdUnits.wdLine;
object count = 1;
word.Selection.MoveEnd(ref unit, ref count);
totaltext = word.Selection.Text;

TextBox1.Text = totaltext;
docs.Close(ref miss, ref miss, ref miss);
word.Quit(ref miss, ref miss, ref miss);
docs = null;
word = null;
var word = new Word.Application();
object miss = Missing.Value;
object path = @"D:\viewstate.docx";
object readOnly = true;
var docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, 
                               ref miss, ref miss, ref miss, ref miss, ref miss, 
                               ref miss, ref miss, ref miss, ref miss, ref miss, 
                               ref miss, ref miss);
string totaltext = "";

object unit = Word.WdUnits.wdLine;
object count = 1;
word.Selection.MoveEnd(ref unit, ref count);
totaltext = word.Selection.Text;

TextBox1.Text = totaltext;
docs.Close(ref miss, ref miss, ref miss);
word.Quit(ref miss, ref miss, ref miss);
docs = null;
word = null;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文