JackRabbit 搜索 PDF 文件
我正在使用 Jackrabbit 执行一些基本的文件操作,例如添加、删除、搜索、版本控制等。一直很好,直到我遇到了 PDF 文件中的搜索问题。请在下面找到我的代码,该代码适用于所有其他格式,如 word、xcel、纯文本,但不适用于 PDF 文件。该代码在执行时没有给出任何异常,只是如果我给出 PDF 文件,它不会给出任何结果。是因为我的 PDF 文件没有索引吗?请帮我。
Query query = queryManager.createQuery("select * from [nt:resource] AS resource where contains(resource.*, '%sampletext%')", Query.JCR_SQL2);
QueryResult result = query.execute();
RowIterator ri = result.getRows();
while (ri.hasNext()) {
Row row = ri.nextRow();
System.out.println("Row: " + row.toString());
}
提前致谢
I am using Jackrabbit to do some basic file operations like add, delete, search, versioning and all. It was good until I got stuck with the search problem in PDF file. Please find below my code that works fine with all other formats like word, xcel, plain text and not working for PDF file. The code is not giving any exception upon execution, it just does not give any result if I give a PDF File. Is it because my PDF file is not indexed?? Please help me.
Query query = queryManager.createQuery("select * from [nt:resource] AS resource where contains(resource.*, '%sampletext%')", Query.JCR_SQL2);
QueryResult result = query.execute();
RowIterator ri = result.getRows();
while (ri.hasNext()) {
Row row = ri.nextRow();
System.out.println("Row: " + row.toString());
}
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我可以想到 3 个可能的根本原因:
可能当时 PDF 文件尚未建立索引(全文索引是在后台线程 AFAIK 中完成的)
pdf 库 (pdfbox) 不在类路径中
pdf 无法作为某些文件的索引原因,在这种情况下,您会在日志文件中看到警告。 p>
I can think of 3 possible root causes:
Possibly the PDF file is not yet indexed at that time (fulltext indexing is done in a background thread AFAIK)
The pdf library (pdfbox) is not in the classpath
The pdf could not be indexes for some reason, in which case you would see a warning in the log file.