Solr 邻近有序与无序
在 Solr 中,您可以使用 Byorder 语法执行有序邻近搜索
"word1 word2"~10
,我的意思是文档中 word1 始终位于 word2 之前。我想知道是否有一种简单的方法来执行无序邻近搜索,即。 word1 和 word2 的出现间隔在 10 个单词之内,并且哪个先出现并不重要。
一种方法是:
"word1 word2"~10 OR "word2 word1"~10
上面的方法可行,但如果可能的话,我正在寻找更简单的方法。
In Solr you can perform an ordered proximity search using syntax
"word1 word2"~10
By ordered, I mean word1 will always come before word2 in the document. I would like to know if there is an easy way to perform an unordered proximity search, ie. word1 and word2 occur within 10 words of each other and it doesn't matter which comes first.
One way to do this would be:
"word1 word2"~10 OR "word2 word1"~10
The above will work but I'm looking for something simpler, if possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Slop 表示可以发生多少个单词换位。因此“a b”将与“b a”不同,因为允许的换位次数不同。
a foo b
具有位置 (a,1)、(foo, 2)、(b, 3)。要匹配 (a,1)、(b,2) 将需要进行一项更改: (b,2) => (b,3)一般情况下,如果
"a b"~n
匹配某些内容,则"b a"~(n+2)
将也匹配一下。编辑:我想我从未给出过答案。我看到两个选项:
我认为#2 可能更好,除非您的斜率一开始就很大。
Slop means how many word transpositions can occur. So "a b" is going to be different than "b a" because a different number of transpositions are allowed.
a foo b
has positions (a,1), (foo, 2), (b, 3). To match (a,1), (b,2) will require one change: (b,2) => (b,3)In general, if
"a b"~n
matches something, then"b a"~(n+2)
will match it too.EDIT: I guess I never gave an answer. I see two options:
I think #2 is probably better, unless your slop is very large to begin with.
你确定它已经不能这样工作了吗?文档中没有任何内容说它是“有序的”:
可以通过草率的短语查询来完成邻近搜索。两个术语在文档中出现的距离越近,得分就越高。草率短语查询指定最大“slop”,或者需要移动标记以获得匹配的位置数量。
标准请求处理程序的此示例将查找“movie”的 100 个单词内出现“batman”的所有文档:
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_search_for_one_term_near_another_term_.28say.2C_.22batman.22_and_.22movie.22.29
Are you sure it's already doesn't work like that? There is nothing in documentation saying that it's 'ordered':
A proximity search can be done with a sloppy phrase query. The closer together the two terms appear in the document, the higher the score will be. A sloppy phrase query specifies a maximum "slop", or the number of positions tokens need to be moved to get a match.
This example for the standard request handler will find all documents where "batman" occurs within 100 words of "movie":
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_search_for_one_term_near_another_term_.28say.2C_.22batman.22_and_.22movie.22.29
从 Solr 4 开始,可以使用 SurroundQueryParser。
例如,进行有序搜索(查询“短语二”跟在“短语一”之后不超过 3 个单词):
进行无序搜索(查询“短语二”靠近“短语一”的 5 个单词):
Since Solr 4 it is possible with SurroundQueryParser.
E.g. to do ordered search (query where "phrase two" follows "phrase one" not further than 3 words after):
To do unordered search (query "phrase two" in proximity of 5 words of "phrase one"):