Lucene中的BlockJoinQuery是否适合搜索有片段的文档(父子关系)
我正在使用 Lucene 来索引由片段组成的文档。 整个文档由描述它的字段组成(即作者、标题、发布日期)。 片段包含文本和标签(关键字)。我希望能够:
- 按作者搜索带有标签 Foo 的所有片段。
- 按标题搜索所有文档。
- 搜索所有文档,其中包含一些单词(在任何片段中)
我在 Lucene 中读到了有关 BlockJoinQuery 的内容,但我不确定它是否适合我的问题:例如,有一个以下文档:
document: title="Hello World" author="Sam Brown"
fragment 1: tags="sunny" text="...."
fragment 2: tags="cloudy" text="moody and sleepy"
我是否能够找到此文档一个查询: 标签:sunny 和 text:sleepy
? 这样的查询不会匹配任何子文档(片段),但也许它会匹配父文档 - 但 lucene 文档没有说明这一点。
I'm using Lucene to index documents consisting of fragments.
The document as a whole consists of fields describing it (ie. author, title, publish date).
Fragments contain text and tags (keywords). I would like to be able to:
- search for all fragments by author, which have tag Foo.
- search for all documents by title.
- search for all documents, which contain some words (in any fragment)
I read about BlockJoinQuery in Lucene, but I am not sure if it's suitable for my problem: for instance, having a following document:
document: title="Hello World" author="Sam Brown"
fragment 1: tags="sunny" text="...."
fragment 2: tags="cloudy" text="moody and sleepy"
would I be able to find this document with a query:tags:sunny and text:sleepy
?
Such query will not match any child document(fragment), but perhaps it would match the parent - the lucene documentation does not state that though.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
案例 1 应该可以很好地与 BlockJoinQuery 配合使用。
案例 2 效果很好,无需使用 BlockJoinQuery。
案例 3 可以工作,尽管它有点棘手,因为您必须在父文档级别进行 AND 操作。即,创建一个带有两个 MUST 子句的 BooleanQuery。第一个子句是 BlockJoinQuery(TermQuery(Term("tags", "sunny"))),第二个子句是 BlockJoinQuery(TermQuery(Term("text", "sleepy")))。我想这应该有用吗?您无法在子文档(片段)级别进行 AND 运算,因为没有一个片段同时具有这两个术语。
Case 1 should work well with BlockJoinQuery.
Case 2 works well, without BlockJoinQuery.
Case 3 can be made to work, though it's a little tricky because you'd have to AND at the parent document level. Ie, make a BooleanQuery with two MUST clauses. First clause is BlockJoinQuery(TermQuery(Term("tags", "sunny"))) and second clause is BlockJoinQuery(TermQuery(Term("text", "sleepy"))). That ought to work I think? You just cannot do the ANDing at the sub-document (fragment) level since no single fragment has both terms.