SoLR 中具有特殊字符的字段排序

发布于 2024-12-06 07:02:44 字数 706 浏览 1 评论 0原文

我是 SoLR 索引方面的新手。我想对具有不同值的位置字段进行排序。它还包含以 'sAmerica、#'Japan、%India 等开头的值。

现在，当我对此字段进行排序时，我确实想考虑特殊字符，例如 's、'#、 !,~ 等等。我想要排序，它将忽略这个字符并返回类似的结果美国排名第一，%印度排名第二，#'日本排名第三。

如何使其成为可能？我正在使用 PatternReplaceFilterFactory，但不知道这一点。

  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" catenateWords="1"  />
    <filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
  </analyzer>
</fieldType>

原文

i am new at SoLR indexing.
I want to sort location field which have different values.it also contains values which starts with 'sAmerica, #'Japan, %India and etc.

Now when i sort this field i do want to consider special characters like 's,'#,!,~ and etc.
i want sorting which will ignore this chars and returns results like
America at 1st position, %India at 2nd and #'Japan at 3rd position..

How to make it possbile? i am using PatternReplaceFilterFactory,but don't know about this.

  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" catenateWords="1"  />
    <filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
  </analyzer>
</fieldType>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

荒岛晴空 2024-12-13 07:02:44

如果您想忽略特殊字符，请尝试使用以下字段类型。
这将小写单词并连接不包括所有特殊字符的单词。

    <fieldType name="string_sort" class="solr.TextField" positionIncrementGap="1">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.WordDelimiterFilterFactory" catenateWords="1" />
        </analyzer>
    </fieldType>

但是，这对于 'sAmerica 不起作用，因为 s 不是特殊字符。

<filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />

如果这是固定模式，您需要将其替换为上面的单词分隔符之前。

编辑 - 你使用这个配置吗？

<fieldType name="string_sort" class="solr.TextField" positionIncrementGap="1">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
        <filter class="solr.WordDelimiterFilterFactory" catenateWords="1" />
    </analyzer>
</fieldType>

通过分析测试了以下内容，并生成以下标记 -

KT - 'sAlgarve
LCF - '萨尔加维
PRF - 阿尔加维
WDF - algarve

你可以通过分析来检查一下吗？

IF you want to ignore the special characters, try using the following field type.
This would lower case the words and catenate the words excluding all special chars.

    <fieldType name="string_sort" class="solr.TextField" positionIncrementGap="1">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.WordDelimiterFilterFactory" catenateWords="1" />
        </analyzer>
    </fieldType>

However, this would not work for 'sAmerica as s is not a special character.

<filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />

If this is fixed pattern you need to replace it before the word delimiter with above.

Edit -- Are you using this config ?

<fieldType name="string_sort" class="solr.TextField" positionIncrementGap="1">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
        <filter class="solr.WordDelimiterFilterFactory" catenateWords="1" />
    </analyzer>
</fieldType>

Have tested the following through analysis and it produces the following tokens -

KT - 'sAlgarve
LCF - 'salgarve
PRF - algarve
WDF - algarve

Can you check through the analysis.

回复收藏 0 原文

~没有更多了~