从 MS Word 中逐字符读取
在我的程序中,我应该从 pdf 文件中逐个字符读取并将每个单词放入数据库中。我怀疑,我到底能不能这么做?然后我决定使用转换器将 pdf 文件转换为 MS WORD 文件,然后从该文件中读取。
现在我仍然不知道如何从 MS Word 文件中逐字符读取。 我在我的程序中使用 C++/MFC。
如果您给我一个示例代码,它将对我很有帮助,我将非常感谢。
in my program I should read Character by character from a pdf file and put evry word on a database. I doubted, can I do that or not? then I decided to convert the pdf file to a MS WORD file with a converter and then read from that file.
Now still I Don't know how can I read Character by character from a MS Word File.
I'm using C++/MFC in my program.
if you give me an sample code it would very help me and I'll be so thanks-full.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
查看 IFilter。
http://msdn.microsoft.com/en -us/library/ms691105%28v=vs.85%29.aspx
它是一个COM接口,用于从文件中提取文本(每个扩展名都有其DLL,COM根据该DLL返回到你需要的)。
C# 示例:http://www.codeproject.com/KB/cs/IFilter。 aspx 或 http://www.codeproject.com/KB/string/ pdf2text.aspx (我在本机c++中使用过它,但我没有代码示例......)。
请注意,对于 PDF,您可能需要关闭 PDF IFilter: http://www .adobe.com/support/downloads/detail.jsp?ftpID=2611
祝你好运!
Check out IFilter.
http://msdn.microsoft.com/en-us/library/ms691105%28v=vs.85%29.aspx
Its a COM interface to extract text from files (each extension has its DLL that the COM returned according to what you need).
An example in C#: http://www.codeproject.com/KB/cs/IFilter.aspx, or http://www.codeproject.com/KB/string/pdf2text.aspx (I've used it in native c++, but I don't have code example...).
Notice that for PDF you might need to down PDF IFilter: http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611
Good Luck!
如果您可以转换源文件并且只需要字符,则将其设为纯文本文件并使用
std::ifstream
读取它。要从 MS Word 文件中获取更复杂的信息,您应该使用 Office Automation。以下问题的答案中有很好的链接:
创建,从 C++ 打开并打印 Word 文件
If you can convert the source file and you only need the characters, then make it a plain text file and read it using
std::ifstream
.To get more sofisticated information from an MS Word file, you should use Office Automation. There are good links in the answers to the following question:
Creating, opening and printing a word file from C++