如何使用 StreamReader 阅读 Word 文档?
我有 n 个 asp.net 2.0 应用程序。我正在尝试上传文件并读取行并将其显示在文本框中。这对于 .txt 文件效果很好。但如果我做一个 Word 文档,我会在文本周围出现各种乱码(看起来像基于 xml 的格式)。这是我的代码...
Dim s As New StringBuilder
Dim rdr As StreamReader
If FileUpload1.HasFile Then
rdr = New StreamReader(FileUpload1.FileContent)
Do Until rdr.EndOfStream
s.Append(rdr.ReadLine() & ControlChars.NewLine)
Loop
TextBox1.Text = s.toString()
End If
I have n asp.net 2.0 app. I am trying to upload a file and read lines and display them in a textbox. This works fine for a .txt file. But if I do a word doc, I get all kinds of jibberish (looks like xml-based formatting) surroudning the text. Here is my code...
Dim s As New StringBuilder
Dim rdr As StreamReader
If FileUpload1.HasFile Then
rdr = New StreamReader(FileUpload1.FileContent)
Do Until rdr.EndOfStream
s.Append(rdr.ReadLine() & ControlChars.NewLine)
Loop
TextBox1.Text = s.toString()
End If
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
StreamReader 不支持 Word 格式的文件。它只是读取字符流。您需要使用某种专门支持 Word 的库。这根本不是一个简单的问题 - 并不总是清楚如何将 Word 文档的任何部分转换为纯文本。
StreamReader doesn't support Word-formatted files. It just reads streams of characters. You need to use some kind of specifically-Word-capable library. This isn't an easy problem at all - it's not always clear how you would convert any portion of a Word document into plaintext.
这是因为 Word 文档文件包含基于 xml 的格式。如果您使用哑文本阅读器(例如
Notepad.exe
,或例如从命令行type
)来查看文件中的内容,您将看到同样的情况。要从周围的格式中提取文本,您需要使用软件(例如 Word 本身、
winword.exe
)以纯文本格式保存或获取文档。That's because the Word document file contains that xml-based formatting. You will see the same thing, if you use a dumb text reader (e.g.
Notepad.exe
, or e.g.type
from the command-line) to see what's in the file.To extract the text from the surrounding formatting, you'll need to use software (e.g. Word itself,
winword.exe
) to save or get the document in plain-text format.您可以使用“Word.ApplicationClass”类,
但是您应该阅读Office 服务器端自动化的注意事项
从另一位捐赠者那里解放出来:
正如我在下面的评论中提到的,这可能对你也有用:
http://npoi.codeplex.com/
You can use the"Word.ApplicationClass" class
However you should read Considerations for server-side Automation of Office
Liberated from another donor:
As mentioned in my comment below this may work for you as ell:
http://npoi.codeplex.com/