为什么必须在长查询字符串中的每个数字之间添加 OR？

发布于 2025-01-05 04:30:39 字数 1293 浏览 1 评论 0原文

通常，当您查询字符串时，Solr 会对所有内容进行标记，并毫无问题地找到文档中的所有单词匹配。然而，我遇到了一个有趣的问题，我花了几个小时才弄清楚。

举例来说，我有一个文档，其中包含一个名为“ids”的字段（fieldtype：text_ws），其中包含以下字符串。

23 128 150 250 384 582 583 586 587 589 641 713 745 761 1004 1040 1080 1512 1551 1626 1882 1891 1911 1912 1913 1947 2035 2120 2140 2141 2143 2176 2219 2430 3023 3041 4087 4221 4243 4737 4776 5126 5130 5194 5224 5225 5226 5555 5564 5565 5568 5611 6310 9984 12048 12143 12878 12929 12930 12931 12933 12935 14001 14048 14049 14051 14079 14080 14082 14083

现在，如果我使用以下内容查询该字段，它将仅匹配第一个数字。但是，如果我在每个之间添加“或”，那么它将匹配几乎所有的它们，因为它应该。

23 128 150 250 384 582 583 586 587 589 641 713 745 761 1004 1040 1512 1551 1626 1703 1760 1882 1891 1911 1913 1947 2035 2120 2140 2141 2143 2176 2219 2430 3023 3041 4087 4221 4243 4737 4776 5126 5130 5194 5224 5225 5226 5555 5564 5565 5568 5611 6310 9984 12048 12143 12878 12929 12930 12931 12933 12935 14001 14048 14049 14051 14079 14080 14082 14083

这是怎么回事？

另外，如何防止 Solr 提高分数？如果我只想知道查询中匹配的项目的百分比怎么办？

text_ws 定义

<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  </analyzer>
</fieldType>

原文

Typically when you query a string Solr will tokenize everything and find all word matches in a document no problem. However I ran into an interesting issue that took me a couple of hours to figure out.

Say for example I have a document with a field (fieldtype: text_ws) called "ids" which contains the following string.

23 128 150 250 384 582 583 586 587 589 641 713 745 761 1004 1040 1080 1512 1551 1626 1882 1891 1911 1912 1913 1947 2035 2120 2140 2141 2143 2176 2219 2430 3023 3041 4087 4221 4243 4737 4776 5126 5130 5194 5224 5225 5226 5555 5564 5565 5568 5611 6310 9984 12048 12143 12878 12929 12930 12931 12933 12935 14001 14048 14049 14051 14079 14080 14082 14083

Now if I queried against that field with the following it would only match the first digit. However if I put OR between each one then it would match almost all of them as it should.

23 128 150 250 384 582 583 586 587 589 641 713 745 761 1004 1040 1512 1551 1626 1703 1760 1882 1891 1911 1913 1947 2035 2120 2140 2141 2143 2176 2219 2430 3023 3041 4087 4221 4243 4737 4776 5126 5130 5194 5224 5225 5226 5555 5564 5565 5568 5611 6310 9984 12048 12143 12878 12929 12930 12931 12933 12935 14001 14048 14049 14051 14079 14080 14082 14083

What's the deal with this?

Additionally, how can I prevent Solr from boosting scores? What if I just want to know what percentage of the items from the query matched?

text_ws definition

<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  </analyzer>
</fieldType>

分享到QQ

分享到微博