如何配置SOLR以便用户默认进行前缀搜索?
我正在使用 SOLR 3.2。我的应用程序在 SOLR 实例上针对文本字段类型发出搜索查询。当用户发出“book”这样的查询时,如何使 SOLR 返回“book”、“bookshelf”、“bookasd”等结果。我应该手动将“*”字符附加到查询字符串中,还是 SOLR 中有一个设置,以便它默认在字段上进行前缀搜索?
这是文本字段类型的 schema.xml 部分:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenat0All="1" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
</analyzer>
</fieldType>
I am using SOLR 3.2. My application issues search queries on SOLR instance, for a text field type. How can i make SOLR to return results like "book", "bookshelf", "bookasd" so on, when user issues a query like "book". Should i append "*" characters to the query string manually or is there a setting in SOLR so it will do prefix searches on the field by default?
This is the schema.xml section for text field type:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenat0All="1" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
</analyzer>
</fieldType>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
有多种方法可以做到这一点,但从性能角度来看,您可能需要使用 EdgeNgramFilterFacortory
There are several ways to do this, but performance wise you might want to use EdgeNgramFilterFacortory
我对一个项目有同样的要求。我必须实施建议。我所做的就是定义这个建议字段类型,
我使用了ShingleFilterFactory,因为我需要获得由一个或多个单词组成的建议。
然后我使用分面查询来获取建议。
我知道它使用方面结果但也许它可以解决你的问题。
如果我的或 Jayendra帕蒂尔的回答没有为您提供解决方案,您也可以看看EdgeNGramFilterFactory
I had the same requirement on a project. I had to implement Suggestion. What i did was defining this suggester fieldType
I used ShingleFilterFactory because I needed to get suggestion composed of one ore more words.
Then I used faceting queries to get suggestions.
I know it uses facet results but maybe it solves your problem.
If my or Jayendra Patil's answer doesn't provide you a solution you can also take a look at EdgeNGramFilterFactory
您要么必须通过在搜索词末尾附加通配符来在客户端进行处理。
影响:-
另一个选择是使用您需要的处理来实现自定义查询解析器。
You would either have to do the handling on the client side by appending the wildcard characters at the end of the search terms.
The impact :-
The other option is to implement custom query parser with the handling you need.
我相信你现在已经明白了这一点,但这里有一个答案:
我通过取最后一项并在最后一项加上通配符加上 OR 来处理这个问题,例如“我最喜欢的书”变成“我+最喜欢的+” (书或书*)”,并会返回“我最喜欢的书架”。无论如何,您可能想对输入进行一些处理(转义等)。
如果您专门寻找与结果开头匹配的文本,那么边缘 n 元语法是最佳选择,但从阅读您的问题来看,您似乎并没有真正要求这样做。
I'm sure you figured this out by now, but just so there's an answer here:
I handled this by taking the last term and putting an OR with the last term plus a wildcard, e.g. "my favorite book" becomes "my+favorite+(book OR book*)", and would return "my favorite bookshelf". You probably want to do some processing on the input anyway (escaping, etc).
If you are specifically looking for the text typed to match the beginning of the result, then edge n-grams are the way to go, but from reading your question it didn't seem you were really asking for that.