Solr 布尔查询与索引时间提升相结合
我有一个使用 Solr 1.4.1 来获取相关性/推荐的网站。我在某些地方使用布尔式查询。我正在使用像 +(+type:aoh_company +aoh_dictionary_tids:623)
这样的查询 - 它提供了预期的结果,但结果的顺序似乎是任意的。
我试图通过设置索引时间提升来控制文档的排名,但这些查询似乎会忽略它们。
示例
- 查询 URL 为 http://localhost:4930/solr/prod/select?rows=5&start=0&q.alt=(type%3Aaoh_company)+(aoh_dictionary_tids%3A623)&q=
- 结果按以下顺序返回(括号内为索引时间提升值):
- 17132 (1.22)
- 17179 (1.02)
- 17131 (1.10)
- 17133 (1.10)
- 17184 (1.10)
- 显然,仅基于提升,结果#2 不应该出现在#3-5 之前。
- 鉴于这是一个布尔查询,排名应该不会有太大差异。
调试输出
我尝试通过将 debugQuery=true
附加到查询来调试上面的查询,因此它变为 http://localhost:4930/solr/prod/select?rows=5&start= 0&q.alt=(type%3Aaoh_company)+(aoh_dictionary_tids%3A623)&q=&debugQuery=true
这是非常冗长,但事实是:
<lst name="debug">
<null name="rawquerystring"/>
<null name="querystring"/>
<str name="parsedquery">+(+type:aoh_company +aoh_dictionary_tids:623)</str>
<str name="parsedquery_toString">+(+type:aoh_company +aoh_dictionary_tids:623)</str>
<lst name="explain">
<str name="50hves/node/17132">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1805), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1805), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1805)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1805), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1805), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1805)
</str>
<str name="50hves/node/17179">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1896), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1896), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1896)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1896), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1896), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1896)
</str>
<str name="50hves/node/17131">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1905), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1905), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1905)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1905), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1905), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1905)
</str>
<str name="50hves/node/17133">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1906), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1906), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1906)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1906), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1906), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1906)
</str>
<str name="50hves/node/17184">
1.6058679 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1892), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1892), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1892)
0.7044275 = (MATCH) weight(aoh_dictionary_tids:623 in 1892), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.7586785 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1892), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.125 = fieldNorm(field=aoh_dictionary_tids, doc=1892)
</str>
</lst>
<str name="QParser">DisMaxQParser</str>
<str name="altquerystring">org.apache.lucene.search.BooleanQuery:+type:aoh_company +aoh_dictionary_tids:623</str>
<null name="boostfuncs"/>
<lst name="timing">
<double name="time">7.0</double>
<lst name="prepare">
<double name="time">1.0</double>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
<lst name="process">
<double name="time">6.0</double>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">6.0</double>
</lst>
</lst>
</lst>
当我读到它时,前四个结果的得分为 1.7819747
,第五个结果的得分为 1.6058679
,而且我看不到提升值在那里的任何地方,所以它们似乎不是排名方程中的一个因素。
那么我做错了什么。我需要做些什么才能让 Solr 考虑到提升吗?
有没有办法检查 Solr 中存储的 boost 值?它在我发送给它的文档中看起来正确,但我找不到查看存储值的方法?
此外,以下是我的 schema.xml
中的相关部分:
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="integer" class="solr.IntField" omitNorms="true"/>
</types>
<fields>
<field name="type" type="string" indexed="true" stored="true"/>
<field name="aoh_dictionary_tids" type="integer" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
</fields>
fyr 在下面的回答中提到,需要在该字段上启用规范才能应用提升值。所以我想稍微修改一下我的问题:
- 在查询字段之一上启用规范以应用提升是否足够?
- 字段上的
omitNorms="false"
是否会覆盖 fieldType 上的omitNorms="true"
?
任何帮助将不胜感激。
I have a site using Solr 1.4.1 for relevancy/recommendations. I am using boolean-style queries in some places. I am using a query like +(+type:aoh_company +aoh_dictionary_tids:623)
- and that provides the expected results, but the order of the results appear to be arbitrary.
I am trying to control the ranking of the document by setting index-time boosts, but they seem to be ignored for these queries.
An example
- The query URL is
http://localhost:4930/solr/prod/select?rows=5&start=0&q.alt=(type%3Aaoh_company)+(aoh_dictionary_tids%3A623)&q=
- The results are returned in this order (with the index time boost value in parentheses):
- 17132 (1.22)
- 17179 (1.02)
- 17131 (1.10)
- 17133 (1.10)
- 17184 (1.10)
- Obviously, result #2 should not come before #3-5 based on the boost alone.
- Given this is a boolean query, there should not be much difference in ranking.
Debugging output
I tried debugging the query above by appending debugQuery=true
to the query, so it becomes http://localhost:4930/solr/prod/select?rows=5&start=0&q.alt=(type%3Aaoh_company)+(aoh_dictionary_tids%3A623)&q=&debugQuery=true
It's very verbose, but here it is:
<lst name="debug">
<null name="rawquerystring"/>
<null name="querystring"/>
<str name="parsedquery">+(+type:aoh_company +aoh_dictionary_tids:623)</str>
<str name="parsedquery_toString">+(+type:aoh_company +aoh_dictionary_tids:623)</str>
<lst name="explain">
<str name="50hves/node/17132">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1805), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1805), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1805)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1805), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1805), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1805)
</str>
<str name="50hves/node/17179">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1896), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1896), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1896)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1896), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1896), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1896)
</str>
<str name="50hves/node/17131">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1905), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1905), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1905)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1905), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1905), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1905)
</str>
<str name="50hves/node/17133">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1906), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1906), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1906)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1906), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1906), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1906)
</str>
<str name="50hves/node/17184">
1.6058679 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1892), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1892), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1892)
0.7044275 = (MATCH) weight(aoh_dictionary_tids:623 in 1892), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.7586785 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1892), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.125 = fieldNorm(field=aoh_dictionary_tids, doc=1892)
</str>
</lst>
<str name="QParser">DisMaxQParser</str>
<str name="altquerystring">org.apache.lucene.search.BooleanQuery:+type:aoh_company +aoh_dictionary_tids:623</str>
<null name="boostfuncs"/>
<lst name="timing">
<double name="time">7.0</double>
<lst name="prepare">
<double name="time">1.0</double>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
<lst name="process">
<double name="time">6.0</double>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">6.0</double>
</lst>
</lst>
</lst>
As I read it, the first four results are scored 1.7819747
, and the fifth is scored 1.6058679
, and I can't see the boost values anywhere in there, so it seems that they are not a factor in the ranking equation.
So what am I doing wrong. Is there something I need to do to make Solr take the boosts into consideration?
Is there a way to check the boost value stored in Solr? It looks right in the documents I send to it, but I can't find a way to see the stored value?
Additionally, here's the relevant parts from my schema.xml
:
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="integer" class="solr.IntField" omitNorms="true"/>
</types>
<fields>
<field name="type" type="string" indexed="true" stored="true"/>
<field name="aoh_dictionary_tids" type="integer" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
</fields>
In his answer below, fyr mentioned that norms need to be enabled on the field for the boost value to apply. So I'd like to amend my question a bit:
- Is it enough to have norms enabled on one of the queried fields for the boost to apply?
- Does my
omitNorms="false"
on the field override theomitNorms="true"
on the fieldType?
Any help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您不会在解释中看到提升。索引时的提升应用于特定文档中特定字段的规范。就像乘法器一样。
如果您启用了 Norms,则您的 bosst 值将在索引时使用。如果您使用 DefaultSimilarity 并且启用了范数,则范数始终是相似性函数的一部分。
编辑后续问题:
启用规范就足以应用提升。因为规范为索引中的字段提供了索引中的数据权重结构。索引时间提升会乘以范数值并保存到范数字段。
字段声明中的omitNorms 会覆盖类型定义 - 您也会在解释结构中看到这一点。 aoh_dictionary 的值不等于 1。如果禁用规范,则默认应用 1。
You will not see the boost in the explain. Boosting at indexing time is applied to the Norms of a certain field in a certain document. Like a multiplicator.
If you have Norms enabled your bosst value is used at indexing time. Norms are always part of the similarity function if you use the DefaultSimilarity and Norms are enabled.
Edit for the follow up questions:
It is enough to have norms enabled for the boost to apply. Because norms provide the field in the index with a data weight structure in the index. And index time boosts are multiplied on the norm value and saved to the norm field.
omitNorms on the field declaration overrides the type definition - You see this also on your explain structure. aoh_dictionary has a value which does not equal 1. If norms are disabled 1 is as default applied.