如何使用 NGramTokenizerFactory 或 NGramFilterFactory?
最近在研究如何使用Solr进行存储和索引。我想做facet.prefix 搜索。使用空格标记器,“你在哪里”将被分成三个单词并建立索引。如果我搜索facet.prefix =“where are”,则不会返回任何结果。
我谷歌发现 NGramFilterFactory 可以帮助我。但是当我应用这个过滤器工厂时,我发现结果是“w,h,e,...,wh,..”,它按字符而不是按标记词分割句子。
我使用参数 maxGramSize 和 minGramSize,设置为 1 和 3。NGramFilterFactory 工作正常吗?我应该添加一些其他参数吗?还有其他过滤器工厂可以帮助我吗?
谢谢!
Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned.
I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word.
I use the parameters maxGramSize and minGramSize, set to 1 and 3. Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Facet 只能应用于非标记化字段,例如字符串。如果您希望显示“是什么”的结果,则对该字段(或 copyField 指令)根本不使用标记器。我猜您想使用facet.prefix 进行自动补全。你可以这样做,看看在这里。
对于 ngramtokenizer 看看这个。
Facets should only be applied to non tokenized fields like strings. if you want that results will be displayed for "what are" use no tokenizer at all for that field (or a copyField directive). I guess that you want to use facet.prefix for autocompletion. you can do this, look here.
for the ngramtokenizer check this out.