Lucene 负载评分

发布于 2024-11-17 11:45:37 字数 134 浏览 2 评论 0原文

我想弄清楚负载评分在 lucene 中是如何工作的。由于我不明白 PayloadFunction 的作用,我想我不太明白它是如何工作的。尝试用谷歌搜索它,但除了浏览源代码的建议之外找不到太多。好吧,如果有人可以在这里解释它,那就太好了,否则就是源代码:)

I want to figure out how payload scoring works in lucene. Since I don't understand where PayloadFunction fits in, I think I don't really understand how it works. Tried googling for it, but couldn't find much apart from advice to go through source. Well, it would be nice if someone can explain it here, else source code it is :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

策马西风 2024-11-24 11:45:38

它分为三个部分。首先,您应该在分析过程中生成有效负载。这可以使用 PayloadAttribute 来完成。您只需在分析过程中将此属性添加到所需的术语中即可。

class MyFilter extends TokenFilter {

  private PayloadAttribute attr;

  public MyFilter() {
    attr = addAttribute(PayloadAttribute.class);
  }

  public final boolean incrementToken() throws IOException {
    if (input.incrementToken()) {
      Payload p = new Payload(PayloadHelper.encodeFloat(42));
      attr.setPayload(p);
    } else {
      attr.setPayload(null);
    }
}

然后在搜索过程中您应该使用特殊的查询类PayloadTermQuery。此类的行为与 SpanTermQuery 相同,但会跟踪索引中的有效负载。使用自定义Similarity实现,您可以对文档中每个有效负载的出现进行评分。

public class MySimilarity extends DefaultSimilarity {

  public float scorePayload(int docID, String fieldName,
                            int start, int end, byte[] payload,
                            int offset, int length) {
    if (payload != null) {
      return PayloadHelper.decodeFloat(payload, offset);
    } else {
      return 1.0f;
    }
  }
}

最后,使用 PayloadFunction,您可以聚合文档上的有效负载分数以生成最终文档分数。

There are three parts of it. First of all you should generate payloads during analysis. This could be done using PayloadAttribute. You just need to add this attribute to terms you want during analysis.

class MyFilter extends TokenFilter {

  private PayloadAttribute attr;

  public MyFilter() {
    attr = addAttribute(PayloadAttribute.class);
  }

  public final boolean incrementToken() throws IOException {
    if (input.incrementToken()) {
      Payload p = new Payload(PayloadHelper.encodeFloat(42));
      attr.setPayload(p);
    } else {
      attr.setPayload(null);
    }
}

Then during searching you should use special query class PayloadTermQuery. This class behaves as SpanTermQuery but do track of payloads in index. Using custom Similarity implementation you could score each payload occurrence in document.

public class MySimilarity extends DefaultSimilarity {

  public float scorePayload(int docID, String fieldName,
                            int start, int end, byte[] payload,
                            int offset, int length) {
    if (payload != null) {
      return PayloadHelper.decodeFloat(payload, offset);
    } else {
      return 1.0f;
    }
  }
}

Finally, using PayloadFunction you could aggregate payload scores over document to produce final document score.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文