如何在 SOLR 中创建字符串字段的不区分大小写的副本？

发布于 2024-08-17 19:39:58 字数 1318 浏览 3 评论 0原文

如何以不区分大小写的形式创建字符串字段的副本？我想使用典型的“字符串”类型和不区分大小写的类型。类型的定义如下：

    <fieldType name="string" class="solr.StrField"
        sortMissingLast="true" omitNorms="true" />

    <!-- A Case insensitive version of string type  -->
    <fieldType name="string_ci" class="solr.StrField"
        sortMissingLast="true" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

字段的示例如下：

<field name="destANYStr" type="string" indexed="true" stored="true"
    multiValued="true" />
<!-- Case insensitive version -->
<field name="destANYStrCI" type="string_ci" indexed="true" stored="false" 
    multiValued="true" />

我尝试像这样使用 CopyField：

<copyField source="destANYStr" dest="destANYStrCI" />

但是，显然在调用任何分析器之前，在源和目标上调用 CopyField，所以即使我已指定目标是情况- 通过分析器不敏感，保留从源字段复制的值的大小写。

我希望避免在记录创建时重新传输客户端字段中的值。

原文

How can I create a copy of a string field in case insensitive form? I want to use the typical "string" type and a case insensitive type. The types are defined like so:

    <fieldType name="string" class="solr.StrField"
        sortMissingLast="true" omitNorms="true" />

    <!-- A Case insensitive version of string type  -->
    <fieldType name="string_ci" class="solr.StrField"
        sortMissingLast="true" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

And an example of the field like so:

<field name="destANYStr" type="string" indexed="true" stored="true"
    multiValued="true" />
<!-- Case insensitive version -->
<field name="destANYStrCI" type="string_ci" indexed="true" stored="false" 
    multiValued="true" />

I tried using CopyField like so:

<copyField source="destANYStr" dest="destANYStrCI" />

But, apparently CopyField is called on source and dest before any analyzers are invoked, so even though I've specified that dest is case-insensitive through anaylyzers the case of the values copied from source field are preserved.

I'm hoping to avoid re-transmitting the value in the field from the client, at record creation time.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绻影浮沉 2024-08-24 19:39:58

由于没有得到 SO 的答复，我跟进了 SOLR 用户列表。在考虑 copyField 的影响之前，我发现我的 string_ci 字段没有按预期工作。 Ahmet Arslan 解释了为什么“string_ci”字段应该使用 solr.TextField 而不是 solr.StrField：

来自 apache-solr-1.4.0\example\solr\conf\schema.xml ：
“不会分析 StrField 类型，而是逐字索引/存储。”
“solr.TextField 允许将自定义文本分析器指定为分词器和分词过滤器列表。”

通过他提供的示例和我自己的轻微调整，以下字段定义似乎可以解决问题，现在 CopyField 也可以按预期工作。

    <fieldType name="string_ci" class="solr.TextField"
        sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

destANYStrCI 字段将存储一个保留大小写的值，但将提供一个不区分大小写的字段进行搜索。注意：无法执行不区分大小写的通配符搜索，因为通配符短语会绕过查询分析器，并且在与索引匹配之前不会被小写。这意味着通配符短语中的字符必须是小写才能匹配。

With no answers from SO, I followed up on the SOLR users list. I found that my string_ci field was not working as expected before even considering the effects of copyField. Ahmet Arslan explains why the "string_ci" field should be using solr.TextField and not solr.StrField:

From apache-solr-1.4.0\example\solr\conf\schema.xml :
"The StrField type is not analyzed, but indexed/stored verbatim."
"solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters."

With an example he provdied and a slight tweak by myself, the following field definition seems to do the trick, and now the CopyField works as expected as well.

    <fieldType name="string_ci" class="solr.TextField"
        sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

The destANYStrCI field will have a case preserved value stored but will provide a case insensitive field to search on. CAVEAT: case insensitive wildcard searching cannot be done since wild card phrases bypass the query analyzer and will not be lowercased before matching against the index. This means that the characters in wildcard phrases must be lowercase in order to match.

回复收藏 0 原文

往日 2024-08-24 19:39:58

是的，确实如此。 LowerCaseFilterFactory 不适用于 String 数据类型。我们只能在文本字段上应用 LowerCaseFilterFactory。

如果你尝试这样做

<!-- Assigning customised data type -->
<field name="language" type="text_lower" indexed="true" stored="true"  multiValued="false" default="en"/>  

<!-- Defining customised data type for lower casing. -->
<fieldType name="text_lower" class="solr.String" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

是行不通的，我们必须使用TextField。

试试这个方法，应该有效。只需将 fieldType 从 String 更改为 TextField

<!-- Assigning customised data type -->
<field name="language" type="text_lower" indexed="true" stored="true"  multiValued="false" default="en"/>  

<!-- Defining customised data type for lower casing. -->
<fieldType name="text_lower" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Yes true. LowerCaseFilterFactory does not applies to String data type. We could only apply LowerCaseFilterFactory on Text fields.

If you try to do this way

<!-- Assigning customised data type -->
<field name="language" type="text_lower" indexed="true" stored="true"  multiValued="false" default="en"/>  

<!-- Defining customised data type for lower casing. -->
<fieldType name="text_lower" class="solr.String" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

It would not work, We have to use TextField.

Try this way, it should work. Just change the fieldType from String to TextField

<!-- Assigning customised data type -->
<field name="language" type="text_lower" indexed="true" stored="true"  multiValued="false" default="en"/>  

<!-- Defining customised data type for lower casing. -->
<fieldType name="text_lower" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

回复收藏 0 原文

~没有更多了~