Solr 邻近有序与无序

发布于 2024-09-30 11:11:09 字数 306 浏览 14 评论 0原文

在 Solr 中,您可以使用 Byorder 语法执行有序邻近搜索

"word1 word2"~10

,我的意思是文档中 word1 始终位于 word2 之前。我想知道是否有一种简单的方法来执行无序邻近搜索,即。 word1 和 word2 的出现间隔在 10 个单词之内,并且哪个先出现并不重要。

一种方法是:

"word1 word2"~10 OR "word2 word1"~10

上面的方法可行,但如果可能的话,我正在寻找更简单的方法。

In Solr you can perform an ordered proximity search using syntax

"word1 word2"~10

By ordered, I mean word1 will always come before word2 in the document. I would like to know if there is an easy way to perform an unordered proximity search, ie. word1 and word2 occur within 10 words of each other and it doesn't matter which comes first.

One way to do this would be:

"word1 word2"~10 OR "word2 word1"~10

The above will work but I'm looking for something simpler, if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

情深如许 2024-10-07 11:11:09

Slop 表示可以发生多少个单词换位。因此“a b”将与“b a”不同,因为允许的换位次数不同。

  • a foo b 具有位置 (a,1)、(foo, 2)、(b, 3)。要匹配 (a,1)、(b,2) 将需要进行一项更改: (b,2) => (b,3)
  • 但是,要匹配 (b,1), (a,2),您将需要 (a,2) => (a,1)和(b,1)=> (b,3),总共三个位置移动

一般情况下,如果 "a b"~n 匹配某些内容,则 "b a"~(n+2) 将也匹配一下。

编辑:我想我从未给出过答案。我看到两个选项:

  1. 如果您想要 n 的斜率,请将其增加到 n+2
  2. 按照您的建议手动分解您的搜索

我认为#2 可能更好,除非您的斜率一开始就很大。

Slop means how many word transpositions can occur. So "a b" is going to be different than "b a" because a different number of transpositions are allowed.

  • a foo b has positions (a,1), (foo, 2), (b, 3). To match (a,1), (b,2) will require one change: (b,2) => (b,3)
  • However, to match (b,1), (a,2) you will need (a,2) => (a,1) and (b,1) => (b,3), for a total of three position movements

In general, if "a b"~n matches something, then "b a"~(n+2) will match it too.

EDIT: I guess I never gave an answer. I see two options:

  1. If you want a slop of n, increase it to n+2
  2. Manually disjunctivize your search like you suggested

I think #2 is probably better, unless your slop is very large to begin with.

清欢 2024-10-07 11:11:09

你确定它已经不能这样工作了吗?文档中没有任何内容说它是“有序的”:

可以通过草率的短语查询来完成邻近搜索。两个术语在文档中出现的距离越近,得分就越高。草率短语查询指定最大“slop”,或者需要移动标记以获得匹配的位置数量。

标准请求处理程序的此示例将查找“movie”的 100 个单词内出现“batman”的所有文档:

http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_search_for_one_term_near_another_term_.28say.2C_.22batman.22_and_.22movie.22.29

Are you sure it's already doesn't work like that? There is nothing in documentation saying that it's 'ordered':

A proximity search can be done with a sloppy phrase query. The closer together the two terms appear in the document, the higher the score will be. A sloppy phrase query specifies a maximum "slop", or the number of positions tokens need to be moved to get a match.

This example for the standard request handler will find all documents where "batman" occurs within 100 words of "movie":

http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_search_for_one_term_near_another_term_.28say.2C_.22batman.22_and_.22movie.22.29

泅渡 2024-10-07 11:11:09

从 Solr 4 开始,可以使用 SurroundQueryParser

例如,进行有序搜索(查询“短语二”跟在“短语一”之后不超过 3 个单词):

3W(phrase W one, phrase W two)

进行无序搜索(查询“短语二”靠近“短语一”的 5 个单词):

5N(phrase W one, phrase W two)

Since Solr 4 it is possible with SurroundQueryParser.

E.g. to do ordered search (query where "phrase two" follows "phrase one" not further than 3 words after):

3W(phrase W one, phrase W two)

To do unordered search (query "phrase two" in proximity of 5 words of "phrase one"):

5N(phrase W one, phrase W two)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文