Sunspot on Rails 中命中结果分数的计算公式是什么？

发布于 2024-12-02 18:11:29 字数 1443 浏览 5 评论 0原文

比如说，我的模型中有这样的代码：

class Facility < ActiveRecord::Base
...
searchable do
  text :name
  text :facility_type do
end
...

在搜索控制器中：

 @search = Facility.search do
    keywords(query) do
      boost_fields :name =>  1.9,
                   :facility_type => 1.98
    end
    ...

我有两个 Facility 对象 - 第一个对象的类型为“cafe”，但名称中没有单词“cafe”，第二个对象 - 名为“例如，“Cafe Sun”，但实际上是“酒吧”类型。

我使用 query="cafe" 运行搜索并在响应中获取两个设施，但“cafe sun”的得分为 5.003391，真正的“cafe”的得分为 1.250491

对于第二次尝试，我

boost_fields :name =>  1.9, :facility_type => 3

为“cafe sun”设置得分并不没有改变，但“cafe”有所增长 - 1.8946824

那么，只要结果按分数排序，我感兴趣的是它是如何计算的？

或者我选择了错误的标记器或其他东西，这是我在 schema.xml 中的内容

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory"
            minGramSize="3"
            maxGramSize="30"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

原文

Say, I have this code in my model:

class Facility < ActiveRecord::Base
...
searchable do
  text :name
  text :facility_type do
end
...

And this in search controller:

 @search = Facility.search do
    keywords(query) do
      boost_fields :name =>  1.9,
                   :facility_type => 1.98
    end
    ...

And I have two Facility objects - first one having a type "cafe", but not having a word "cafe" in the name, a second one - called "cafe sun", for example, but being of a "bar" type in fact.

I run the search with query="cafe" and get both facilities in the response, but the score is 5.003391 for a "cafe sun" and 1.250491 for a real "cafe"

For the second try I set

boost_fields :name =>  1.9, :facility_type => 3

Score for "cafe sun" doesn't change, but "cafe" somewhat grew up - 1.8946824

So, as long as results get sorted by the score, I am interested how is it calculated ?

Or am I choosing wrong tokenizers or something, here is what I have in schema.xml

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory"
            minGramSize="3"
            maxGramSize="30"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

醉生梦死 2024-12-09 18:11:29

评分结果是 Lucene 库的领域，其算法的关键在这里详细描述：

进行检查原始评分数据，直接对 Solr 实例运行查询并附加 debugQuery=on 参数以查看评分数据。

http://localhost:8983/solr/select?q=test&defType=dismax&qf=name_text+facility_type_text&debugQuery=on

对于 Solr 中的一般相关性优化，您可以查阅 SolrRelevancyFAQ。它还有一个问题专门演示debugQuery<的输出/code>

总而言之：您提出了一个非常好的问题，并给出了非常深刻的答案。我可能会编辑我的回复以扩展该主题。

Scoring results is the domain of the Lucene library, and the crux of its algorithm is described in detail here:

To inspect the raw scoring data, run a query against your Solr instance directly and append the debugQuery=on parameter to see scoring data.

http://localhost:8983/solr/select?q=test&defType=dismax&qf=name_text+facility_type_text&debugQuery=on

For general relevancy optimizations in Solr, you can consult the SolrRelevancyFAQ. It also has one question specifically demonstrating the output of debugQuery

All in all: you ask a very good question with a very deep answer. I may edit my response down the road to expand on the subject.

回复收藏 0 原文

~没有更多了~