在java中搜索Docx文件
我正在编写一个用于搜索文档内容的应用程序 我已经编写了用于搜索可通过记事本编辑的文档的代码。
我也希望对 docx 文件执行相同的操作。经过一番研究,我想出了这两件事
http://www.infoq.com/articles/cracking-office-2007-with-java" rel="nofollow">http:// /www.infoq.com/articles/cracking-office-2007-with-java 此方法要求我提取 docx 文件,然后搜索 xml 文件,但这会在提取部分产生额外的开销,坦率地说,我不知道如何处理 xml 文件(丢弃属性内容等)
http://www.javadocx.com/download 这个方法允许我将一个jar库导入到我的项目中,据说我可以用它创建docx文件,我不明白的是如何使用它打开docx文件
任何人都可以推荐我一种替代方法来执行相同的操作或帮助上面提到的两种方法?
I am writing an application for searching the Content of Documents
i have already written the code for searching the documents which are editable by notepad.
I also wish to do the same for docx files. After some research i have come up with these two things
http://www.infoq.com/articles/cracking-office-2007-with-java
this method requires me to extract docx file and then search the xml files however this would involve an extra overhead on the extraction part and frankly i dont know how to process an xml file ( discarding attribute content etc)http://www.javadocx.com/download
this method allows me to import a jar library to my project and supposedly i can create docx files with it, what i dont understand is how to open docx files using it
can anyone recommend me a alternate method to perform the same action or help with the above two mentioned methods?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试 http://tika.apache.org/ 或 docx4j 或 POI。
Try http://tika.apache.org/ or docx4j or POI.