Sunspot on Rails 中命中结果分数的计算公式是什么?

发布于 2024-12-02 18:11:29 字数 1443 浏览 0 评论 0原文

比如说,我的模型中有这样的代码:

class Facility < ActiveRecord::Base
...
searchable do
  text :name
  text :facility_type do
end
...

在搜索控制器中:

 @search = Facility.search do
    keywords(query) do
      boost_fields :name =>  1.9,
                   :facility_type => 1.98
    end
    ...

我有两个 Facility 对象 - 第一个对象的类型为“cafe”,但名称中没有单词“cafe”,第二个对象 - 名为“例如,“Cafe Sun”,但实际上是“酒吧”类型。

我使用 query="cafe" 运行搜索并在响应中获取两个设施,但“cafe sun”的得分为 5.003391,真正的“cafe”的得分为 1.250491

对于第二次尝试,我

boost_fields :name =>  1.9, :facility_type => 3

为“cafe sun”设置得分并不没有改变,但“cafe”有所增长 - 1.8946824

那么,只要结果按分数排序,我感兴趣的是它是如何计算的?

或者我选择了错误的标记器或其他东西,这是我在 schema.xml 中的内容

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory"
            minGramSize="3"
            maxGramSize="30"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Say, I have this code in my model:

class Facility < ActiveRecord::Base
...
searchable do
  text :name
  text :facility_type do
end
...

And this in search controller:

 @search = Facility.search do
    keywords(query) do
      boost_fields :name =>  1.9,
                   :facility_type => 1.98
    end
    ...

And I have two Facility objects - first one having a type "cafe", but not having a word "cafe" in the name, a second one - called "cafe sun", for example, but being of a "bar" type in fact.

I run the search with query="cafe" and get both facilities in the response, but the score is 5.003391 for a "cafe sun" and 1.250491 for a real "cafe"

For the second try I set

boost_fields :name =>  1.9, :facility_type => 3

Score for "cafe sun" doesn't change, but "cafe" somewhat grew up - 1.8946824

So, as long as results get sorted by the score, I am interested how is it calculated ?

Or am I choosing wrong tokenizers or something, here is what I have in schema.xml

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory"
            minGramSize="3"
            maxGramSize="30"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

醉生梦死 2024-12-09 18:11:29

评分结果是 Lucene 库的领域,其算法的关键在这里详细描述:

进行检查原始评分数据,直接对 Solr 实例运行查询并附加 debugQuery=on 参数以查看评分数据。

http://localhost:8983/solr/select?q=test&defType=dismax&qf=name_text+facility_type_text&debugQuery=on

对于 Solr 中的一般相关性优化,您可以查阅 SolrRelevancyFAQ。它还有一个问题专门演示debugQuery<的输出/code>

总而言之:您提出了一个非常好的问题,并给出了非常深刻的答案。我可能会编辑我的回复以扩展该主题。

Scoring results is the domain of the Lucene library, and the crux of its algorithm is described in detail here:

To inspect the raw scoring data, run a query against your Solr instance directly and append the debugQuery=on parameter to see scoring data.

http://localhost:8983/solr/select?q=test&defType=dismax&qf=name_text+facility_type_text&debugQuery=on

For general relevancy optimizations in Solr, you can consult the SolrRelevancyFAQ. It also has one question specifically demonstrating the output of debugQuery

All in all: you ask a very good question with a very deep answer. I may edit my response down the road to expand on the subject.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文