Solr 中的精确单词搜索
我有一个与这个问题密切相关的问题。
在我的模式中,我有一个字段
<field name="text" type="textgen" indexed="true" stored="true" required="true"/>
这给出了完全匹配,即。词干禁用
吃=吃
是否可以,同时配置为textgen来搜索该词的其他变体
例如。吃=吃,吃,吃
eat~0 会给出类似的发音单词,例如肉、beat 等,但这不是我想要的。
我开始认为实现这一目标的唯一方法是添加另一个字段,其中包含除 textgen 之外的其他内容,但如果有更简单的方法,我很想听听。
I have a question which closely relates to this question.
In my schema I have a field
<field name="text" type="textgen" indexed="true" stored="true" required="true"/>
This gives an exact match, ie. stemming disabled
eat = eat
Is it possible, while configured to textgen to search for other variants of the word
eg. eat = eat, eats, eating
eat~0 will give similar sounding words such as meat, beat etc. but this is not what I want.
I'm starting to think that the only way to achieve this is to add another field with something other then textgen but if there is a simpler way I am very interested to hear it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用
copyfield
语句是 Solr 中的常规方法。由于stemming
正是您所要求的答案,因此我建议您使用它。如果您担心索引大小,可以设置stored=false
。您还可以使用
词形还原
,这与词干提取相反 - 您可以在词干提取中添加所有变形形式的单词。这通常在搜索查询上执行,例如将eat
扩展为eat、eats、eating
等。第三种替代方案可能是使用通配符搜索,尽管我不会鼓励它。尤其是因为它绕过了目标字段的所有架构配置过滤器。
Using
copyfield
statements is the normal approach in Solr. Sincestemming
is the answer to exactly what you're asking, this is what I recommend you to use. You can setstored=false
if you are worried about index size.You might also use
lemmatisation
, which is the opposite of stemming - where you instead add a words all inflected forms. This is typically performed on the search query, expanding e.g.,eat
toeat, eats, eating
etc.The third alternative might be to use wildcard search, although I wouldn't encourage it. Not least since it bypasses all schema configured filters for the target field.
如果您使用
text
作为字段类型,那么 eat、eats、eated 和 eat 都将存储为eat
并搜索FieldName:eat
将找到所有这些。如果您将字段类型更改为text-gen
,那么搜索FieldName:eat
只会找到“eat”,而不是 eats、吃过或正在吃。If you use
text
as the field type, then eat, eats, eaten and eating will all be stored aseat
and a search forFieldName:eat
will find all of them. If you change the field type totext-gen
then the search forFieldName:eat
will only find "eat", not eats, eaten or eating.