And I would highly recommend downloading the Open XML SDK productivity tool which will help you understand how the Open XML files are structured, and can even help you generate source code to use with the SDK based on the structure of your documents. You can download the tool from the same page as the SDK. It's 100MB, but it's worth the download.
You can unzip them into their composite XML files and read through the relevant XML file (file.docx\word\document.xml)) and pull out the email addresses.
This library will help you to unzip the archive: .Net Zip Library
发布评论
评论(7)
最简单的方法可能是使用 打开 XML SDK 2.0
获取 Visual Studio 2008 的代码片段 获取一些示例
我强烈建议您下载 Open XML SDK 生产力工具,它将帮助您了解 Open XML 文件的结构,甚至可以帮助您生成源代码根据文档的结构与 SDK 一起使用的代码。您可以从与 SDK 相同的页面下载该工具。它有 100MB,但值得下载。
The easiest way is probably to use the Open XML SDK 2.0
Get Code Snippets for Visual Studio 2008 for some examples
And I would highly recommend downloading the Open XML SDK productivity tool which will help you understand how the Open XML files are structured, and can even help you generate source code to use with the SDK based on the structure of your documents. You can download the tool from the same page as the SDK. It's 100MB, but it's worth the download.
您可以简单地使用 Docx 库,它非常好且易于使用。
有关如何使用的示例以及许多示例和视频,请查看其 GitHub 页面。如需下载,您可以从此处下载
You can simply use Docx library, it is very good and easy to use.
For samples guiding how to use and many examples and videos, check their GitHub page. For download, you could download from here
是的,我知道这是一篇非常旧的帖子,但这些信息可能会对搜索论坛的其他人有所帮助。
使用 Sourceforge 中的此库。
添加对该库的引用,然后:
Code7248.word_reader.TextExtractor extractor = new TextExtractor(filePath);
stringcontents = extractor.ExtractText();
Yes, I know this is a very old post, but this information might help others who are searching the forums.
Use this library from Sourceforge.
Add a reference to that library, and then:
Code7248.word_reader.TextExtractor extractor = new TextExtractor(filePath);
string contents = extractor.ExtractText();
您可以通过 Interop 读取 Microsoft Office 文件,也可以通过 Open XML 读取 Office >2007 文件:
You can read Microsoft Office files through Interop, and Office >2007 files through Open XML as well:
docx 文件实际上是档案。
您可以将它们解压缩到复合 XML 文件中,并通读相关的 XML 文件 (file.docx\word\document.xml)) 并提取电子邮件地址。
该库将帮助您解压缩存档:.Net Zip Library
docx files are in fact archives.
You can unzip them into their composite XML files and read through the relevant XML file (file.docx\word\document.xml)) and pull out the email addresses.
This library will help you to unzip the archive: .Net Zip Library
Office 2007 及更高版本遵循 OpenXML 格式。您需要打包 API 来打开和阅读文档部分
http:// msdn.microsoft.com/en-us/library/system.io.packaging.aspx
http://openxmldeveloper.org
Office 2007 and above follow OpenXML format. you need Packaging API to open and read document parts
http://msdn.microsoft.com/en-us/library/system.io.packaging.aspx
http://openxmldeveloper.org
有免费的方式阅读文档和文档。 docx 文件,它可以帮助你。
http://freeword.codeplex.com/
There is free way to read doc & docx file, It could help you.
http://freeword.codeplex.com/