Sunspot on Rails 中命中结果分数的计算公式是什么?
比如说,我的模型中有这样的代码:
class Facility < ActiveRecord::Base
...
searchable do
text :name
text :facility_type do
end
...
在搜索控制器中:
@search = Facility.search do
keywords(query) do
boost_fields :name => 1.9,
:facility_type => 1.98
end
...
我有两个 Facility 对象 - 第一个对象的类型为“cafe”,但名称中没有单词“cafe”,第二个对象 - 名为“例如,“Cafe Sun”,但实际上是“酒吧”类型。
我使用 query="cafe" 运行搜索并在响应中获取两个设施,但“cafe sun”的得分为 5.003391,真正的“cafe”的得分为 1.250491
对于第二次尝试,我
boost_fields :name => 1.9, :facility_type => 3
为“cafe sun”设置得分并不没有改变,但“cafe”有所增长 - 1.8946824
那么,只要结果按分数排序,我感兴趣的是它是如何计算的?
或者我选择了错误的标记器或其他东西,这是我在 schema.xml 中的内容
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory"
minGramSize="3"
maxGramSize="30"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Say, I have this code in my model:
class Facility < ActiveRecord::Base
...
searchable do
text :name
text :facility_type do
end
...
And this in search controller:
@search = Facility.search do
keywords(query) do
boost_fields :name => 1.9,
:facility_type => 1.98
end
...
And I have two Facility objects - first one having a type "cafe", but not having a word "cafe" in the name, a second one - called "cafe sun", for example, but being of a "bar" type in fact.
I run the search with query="cafe" and get both facilities in the response, but the score is 5.003391 for a "cafe sun" and 1.250491 for a real "cafe"
For the second try I set
boost_fields :name => 1.9, :facility_type => 3
Score for "cafe sun" doesn't change, but "cafe" somewhat grew up - 1.8946824
So, as long as results get sorted by the score, I am interested how is it calculated ?
Or am I choosing wrong tokenizers or something, here is what I have in schema.xml
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory"
minGramSize="3"
maxGramSize="30"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
评分结果是 Lucene 库的领域,其算法的关键在这里详细描述:
进行检查原始评分数据,直接对 Solr 实例运行查询并附加
debugQuery=on
参数以查看评分数据。对于 Solr 中的一般相关性优化,您可以查阅 SolrRelevancyFAQ。它还有一个问题专门演示
debugQuery<的输出/code>
总而言之:您提出了一个非常好的问题,并给出了非常深刻的答案。我可能会编辑我的回复以扩展该主题。
Scoring results is the domain of the Lucene library, and the crux of its algorithm is described in detail here:
To inspect the raw scoring data, run a query against your Solr instance directly and append the
debugQuery=on
parameter to see scoring data.For general relevancy optimizations in Solr, you can consult the SolrRelevancyFAQ. It also has one question specifically demonstrating the output of
debugQuery
All in all: you ask a very good question with a very deep answer. I may edit my response down the road to expand on the subject.