使用通配符查询时出现 BooleanQuery$TooManyClauses 异常

发布于 2024-08-07 15:24:51 字数 609 浏览 5 评论 0原文

我使用 Hibernate Search / Lucene 来维护一个非常简单的索引来按名称查找对象 - 没有花哨的东西。

我的模型类都扩展了一个类 NamedModel ,它基本上如下所示:

@MappedSuperclass
public abstract class NamedModel {
    @Column(unique = true)
    @Field(store = Store.YES, index = Index.UN_TOKENIZED)
    protected String name;
}

我的问题是,在查询名称以 a 开头的对象的索引时,出现 BooleanQuery$TooManyClauses 异常具体字母,例如“name:l*”。 像 "name:lin*" 这样的查询将毫无问题地工作,事实上,任何在通配符之前使用多个字母的查询都可以工作。

在网上搜索类似问题时,我只发现人们使用相当复杂的查询,并且这似乎总是导致异常。我不想增加 maxClauseCount ,因为我认为仅仅因为达到限制就更改限制不是一个好习惯。

这里有什么问题?

I'm using Hibernate Search / Lucene to maintain a really simple index to find objects by name - no fancy stuff.

My model classes all extend a class NamedModel which looks basically as follows:

@MappedSuperclass
public abstract class NamedModel {
    @Column(unique = true)
    @Field(store = Store.YES, index = Index.UN_TOKENIZED)
    protected String name;
}

My problem is that I get a BooleanQuery$TooManyClauses exception when querying the index for objects with names starting with a specific letter, e.g. "name:l*".
A query like "name:lin*" will work without problems, in fact any query using more than one letter before the wildcard will work.

While searching the net for similar problems, I only found people using pretty complex queries and that always seemed to cause the exception. I don't want to increase maxClauseCount because I don't think it's a good practice to change limits just because you reach them.

What's the problem here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

乖乖 2024-08-14 15:24:51

Lucene 尝试将您的查询从简单的 name:l* 重写为包含所有以 l 开头的术语的查询(类似于 name:lou OR name:la OR name: ...< /code>) - 我相信这意味着更快。

作为解决方法,您可以使用 ConstantScorePrefixQuery 而不是 PrefixQuery

// instead of new PrefixQuery(prefix)
new ConstantScoreQuery(new PrefixFilter(prefix));

但是,这会更改文档的评分(因此,如果您依赖分数进行排序,则需要进行排序)。当我们面临需要分数(和提升)的挑战时,我们决定寻求一种解决方案,如果可能的话,我们使用 PrefixQuery ,并在需要时回退到 ConstantScorePrefixQuery :(

new PrefixQuery(prefix) {
  public Query rewrite(final IndexReader reader) throws IOException {
    try {
      return super.rewrite(reader);
    } catch (final TooManyClauses e) {
      log.debug("falling back to ConstantScoreQuery for prefix " + prefix + " (" + e + ")");
      final Query q = new ConstantScoreQuery(new PrefixFilter(prefix));
      q.setBoost(getBoost());
      return q;
    }
  }
};

作为增强功能,可以使用某种 LRUMap 来缓存之前失败的术语,以避免再次进行代价高昂的重写),

但我无法帮助您将其集成到 Hibernate Search 中。您可能会在切换到 Compass 后询问;)

Lucene tries to rewrite your query from simple name:l* to a query with all terms starting with l in them (something like name:lou OR name:la OR name: ...) - I believe as this is meant to be faster.

As a workaround, you may use a ConstantScorePrefixQuery instead of a PrefixQuery:

// instead of new PrefixQuery(prefix)
new ConstantScoreQuery(new PrefixFilter(prefix));

However, this changes scoring of documents (hence sorting if you rely on score for sorting). As we faced the challenge of needing score (and boost), we decided to go for a solution where we use PrefixQuery if possible and fallback to ConstantScorePrefixQuery where needed:

new PrefixQuery(prefix) {
  public Query rewrite(final IndexReader reader) throws IOException {
    try {
      return super.rewrite(reader);
    } catch (final TooManyClauses e) {
      log.debug("falling back to ConstantScoreQuery for prefix " + prefix + " (" + e + ")");
      final Query q = new ConstantScoreQuery(new PrefixFilter(prefix));
      q.setBoost(getBoost());
      return q;
    }
  }
};

(As an enhancement, one could use some kind of LRUMap to cache terms that failed before to avoid going through a costly rewrite again)

I can't help you with integrating this into Hibernate Search though. You might ask after you've switched to Compass ;)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文