需要在 C# 中在数千个文档(.doc、.docx、.pdf)中搜索社会安全号码
这是访问文档(打开和仅阅读文本)的最佳方式,以便搜索速度更快。我已经尝试使用 Microsoft Office Word 对象通过创建 Word 应用程序并打开文件来打开和获取文本。我什至无法使用线程,因为我只需要创建一个单词应用程序,这对我的线程化没有帮助,如果我在每个线程中创建单词应用程序,系统将无法处理它。你建议我怎样去。
提前致谢
Which is the best way to access the documents (opening and reading only text) so that searching is faster. I have already tried using Microsoft office word object to open and get the text by creating a word application and opening the files. I cant even go with threading because either i need to create only one word application which wont help me in threading and if i create word application in each thread the system cant handle it. How do you suggest me to go.
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
啊...回去阅读操作系统的文档。在相当长的一段时间(即很多年)里,有一个索引和搜索系统,实际上可以连接很多东西(如果你安装了适当的过滤器,可以从微软、adobe 等下载)。
这将创建一个全文索引,然后使用 API 进行搜索。对于重复搜索大量文档来说效率更高。
Ah... go back to reading the documentation of your operating system. FOr quite some time (i.e. many many years) there is an indexing and search system there that actually a lot of things can hook in (if you install the proper filters, downloadable from microsoft, adobe etc.).
This creates a full text index that then has an API to search. A LOT more efficient for repeatedly searching a large number of documents.