如何在 SOLR 中创建字符串字段的不区分大小写的副本?
如何以不区分大小写的形式创建字符串字段的副本?我想使用典型的“字符串”类型和不区分大小写的类型。类型的定义如下:
<fieldType name="string" class="solr.StrField"
sortMissingLast="true" omitNorms="true" />
<!-- A Case insensitive version of string type -->
<fieldType name="string_ci" class="solr.StrField"
sortMissingLast="true" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
字段的示例如下:
<field name="destANYStr" type="string" indexed="true" stored="true"
multiValued="true" />
<!-- Case insensitive version -->
<field name="destANYStrCI" type="string_ci" indexed="true" stored="false"
multiValued="true" />
我尝试像这样使用 CopyField:
<copyField source="destANYStr" dest="destANYStrCI" />
但是,显然在调用任何分析器之前,在源和目标上调用 CopyField,所以即使我已指定目标是情况- 通过分析器不敏感,保留从源字段复制的值的大小写。
我希望避免在记录创建时重新传输客户端字段中的值。
How can I create a copy of a string field in case insensitive form? I want to use the typical "string" type and a case insensitive type. The types are defined like so:
<fieldType name="string" class="solr.StrField"
sortMissingLast="true" omitNorms="true" />
<!-- A Case insensitive version of string type -->
<fieldType name="string_ci" class="solr.StrField"
sortMissingLast="true" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
And an example of the field like so:
<field name="destANYStr" type="string" indexed="true" stored="true"
multiValued="true" />
<!-- Case insensitive version -->
<field name="destANYStrCI" type="string_ci" indexed="true" stored="false"
multiValued="true" />
I tried using CopyField like so:
<copyField source="destANYStr" dest="destANYStrCI" />
But, apparently CopyField is called on source and dest before any analyzers are invoked, so even though I've specified that dest is case-insensitive through anaylyzers the case of the values copied from source field are preserved.
I'm hoping to avoid re-transmitting the value in the field from the client, at record creation time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
由于没有得到 SO 的答复,我跟进了 SOLR 用户列表。在考虑 copyField 的影响之前,我发现我的 string_ci 字段没有按预期工作。 Ahmet Arslan 解释了为什么“string_ci”字段应该使用 solr.TextField 而不是 solr.StrField:
通过他提供的示例和我自己的轻微调整,以下字段定义似乎可以解决问题,现在 CopyField 也可以按预期工作。
destANYStrCI 字段将存储一个保留大小写的值,但将提供一个不区分大小写的字段进行搜索。注意:无法执行不区分大小写的通配符搜索,因为通配符短语会绕过查询分析器,并且在与索引匹配之前不会被小写。这意味着通配符短语中的字符必须是小写才能匹配。
With no answers from SO, I followed up on the SOLR users list. I found that my string_ci field was not working as expected before even considering the effects of copyField. Ahmet Arslan explains why the "string_ci" field should be using solr.TextField and not solr.StrField:
With an example he provdied and a slight tweak by myself, the following field definition seems to do the trick, and now the CopyField works as expected as well.
The destANYStrCI field will have a case preserved value stored but will provide a case insensitive field to search on. CAVEAT: case insensitive wildcard searching cannot be done since wild card phrases bypass the query analyzer and will not be lowercased before matching against the index. This means that the characters in wildcard phrases must be lowercase in order to match.
是的,确实如此。 LowerCaseFilterFactory 不适用于 String 数据类型。我们只能在文本字段上应用 LowerCaseFilterFactory。
如果你尝试这样做
是行不通的,我们必须使用TextField。
试试这个方法,应该有效。只需将 fieldType 从
String
更改为TextField
Yes true. LowerCaseFilterFactory does not applies to String data type. We could only apply LowerCaseFilterFactory on Text fields.
If you try to do this way
It would not work, We have to use TextField.
Try this way, it should work. Just change the fieldType from
String
toTextField