使用 ISOLatin1Accent 字符的 Solr 前缀查询

发布于 2024-12-11 20:07:48 字数 719 浏览 0 评论 0原文

我正在尝试以一种允许我使用前缀查询“æb*”以及“aeb*”查找文档的方式对字段进行索引。会发生什么：它找到后者，但找不到前者。 å、î 等也有同样的问题。

这是我的架构：

<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>

  </analyzer>
</fieldtype>

如您所见，我对索引和查询使用相同的分析器。因此，如果我理解正确，查询“æb*”应该标准化为“aeb*”。 '*' 符号是否有某种干扰？如何设置架构以获得所需结果？

我正在使用 Solr 1.4.1。

原文

I'm trying to index a field in a way that allows me to find the document using prefix query 'æb*' as well as 'aeb*'. What happens: it finds the latter, but not the former. Same issue with å, î, etc.

This is my schema:

<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>

  </analyzer>
</fieldtype>

As you can see I'm using the same analyzers for index and query. So If I understand correctly, the query 'æb*' should get normalized to 'aeb*'. Is the '*' symbol somehow interfering? How can I set up my schema for the desired results?

I'm using Solr 1.4.1.

分享到QQ

分享到微博