solr 同义词未被解析

发布于 2024-12-07 07:48:24 字数 1413 浏览 9 评论 0原文

我正在使用 solr 中过去的搜索来制作自动建议功能。 Synonyms.txt 包含常见拼写错误/拼写错误等的列表。它设置为在索引上运行，并使用管理中的分析工具我可以看到它工作正常 - 但它似乎不适用于实时数据。

Field type :
<field name="suggest_ngrams" type="text_ngram" indexed="true" stored="false" multiValued="true" />

Schema:
<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrement="true"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
 </analyzer>
 <analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>   
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrement="true"/>
 </analyzer>

and an example of synonyms.txt
watch, watches, watche, watchs => watch

因此，在索引时，我希望“watche”被替换为“watch” - 情况似乎并非如此（即使分析工具说这就是它正在做的事情。

要清楚，如果我查询 solr (?q= watc）短语“watche”出现在结果中

任何想法或见解将不胜感激，因为我认为一切都设置正确

谢谢

原文

I'm making an auto-suggest feature using past searches in solr. Synonyms.txt contains a list of common typos / misspellings etc. It's setup to run on index and using the anaysis tool in the admin I can see it's working correctly - however it doesn't seem to be applied to live data.

Field type :
<field name="suggest_ngrams" type="text_ngram" indexed="true" stored="false" multiValued="true" />

Schema:
<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrement="true"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
 </analyzer>
 <analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>   
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrement="true"/>
 </analyzer>

and an example of synonyms.txt
watch, watches, watche, watchs => watch

So at index time I would expect "watche" to be replaced with "watch" - this doesn't seem to be the case (even though the analysis tool says that's what it's doing.

To be clear if I query solr (?q=watc) the phrase "watche" appears in the results

Any ideas or insight would be appreciated as I think everything is setup correctly

Thanks

分享到QQ

分享到微博