solr 同义词未被解析
我正在使用 solr 中过去的搜索来制作自动建议功能。 Synonyms.txt 包含常见拼写错误/拼写错误等的列表。它设置为在索引上运行,并使用管理中的分析工具我可以看到它工作正常 - 但它似乎不适用于实时数据。
Field type :
<field name="suggest_ngrams" type="text_ngram" indexed="true" stored="false" multiValued="true" />
Schema:
<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrement="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrement="true"/>
</analyzer>
and an example of synonyms.txt
watch, watches, watche, watchs => watch
因此,在索引时,我希望“watche”被替换为“watch” - 情况似乎并非如此(即使分析工具说这就是它正在做的事情。
要清楚,如果我查询 solr (?q= watc)短语“watche”出现在结果中
任何想法或见解将不胜感激,因为我认为一切都设置正确
谢谢
I'm making an auto-suggest feature using past searches in solr. Synonyms.txt contains a list of common typos / misspellings etc. It's setup to run on index and using the anaysis tool in the admin I can see it's working correctly - however it doesn't seem to be applied to live data.
Field type :
<field name="suggest_ngrams" type="text_ngram" indexed="true" stored="false" multiValued="true" />
Schema:
<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrement="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrement="true"/>
</analyzer>
and an example of synonyms.txt
watch, watches, watche, watchs => watch
So at index time I would expect "watche" to be replaced with "watch" - this doesn't seem to be the case (even though the analysis tool says that's what it's doing.
To be clear if I query solr (?q=watc) the phrase "watche" appears in the results
Any ideas or insight would be appreciated as I think everything is setup correctly
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我的问题是正确的:-
同义词仅在索引期间使用,不会影响存储的值。
因此,您在分析中看到的是索引时间值,它似乎工作正常。
当您查询 solr 并且它与此结果匹配时,结果将仅返回“watche”,因为这是存储的原始值。
存储的值永远不会被修改,并按原样存储并在响应中返回。
如果我弄错了,请澄清。
If I got the issue right :-
The synonyms are used only during index time and do not affect the stored values.
So what you see in the analysis is the index time values, which seem to work fine.
When you query solr and it matches this result, the results would return "watche" only, as this is the original value stored.
The stored values are never modified and are stored as is and returned in the response.
Please clarify if i got it wrong.
正如 @Jayendra 所描述的,solr 不会更改存储的值。因此,你应该找到另一种方法来处理这个障碍。
就我而言,我想出了一个使用facet的解决方案。如果您在该字段上进行分面,您会收到索引值(映射)。
另一个解决方案是您可以在将数据加载到 Solr 之前在单独的过程中将过滤器应用于数据
As @Jayendra described solr doesn't change stored value. Therefore you should find another way of handling this obstacle.
In my case I come up with a solution using facet. If you facet on that field you receive the indexed value(Mapped).
Another solution is You can apply the filters to the data in a separate process prior to loading the data into Solr