solr 拼写检查器

发布于 2024-09-30 06:19:13 字数 540 浏览 5 评论 0原文

我已经根据此处给出的 fieldType 实现了 solr 拼写检查器： http://wiki.apache.org/solr/SpellCheckingAnalysis 将对供应商名称进行拼写检查，其中应给出与输入的搜索词相关的建议。我已将 copyField 用于上述类型的vendorName 字段，即textSpell 我的一些查询得到了奇怪的整理结果。例如 1）梅西百货没有给我任何结果，而梅西百货给了我想要的结果，即梅西百货。我比较了 maccys 和 maccys 的文本分析（管理工具）。梅西百货同时使用文字和文字textSpell 字段类型都给出了梅西作为最终结果。那么为什么拼写检查器没有返回结果呢？

2) khols 给我“鞋子”整理结果，其中正确的结果“kohls”是（鞋子和商店）之后的第三个建议。

onlyMorePopular 标志为 false，准确度默认为 0.5

提前感谢您的帮助。我在进一步调试方面有点迷失。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

反目相谮 2024-10-07 06:19:13

尽管我们有大量可用数据，但我们也遇到了拼写检查器产生奇怪结果的同样问题。我无法帮助如何更好地调试它，但我可以告诉你我们做了什么：

我们按原样使用文本字段 - 没有空格或标准标记器！如果要索引的数据较少，不仅可以索引“hellorabbit”，还可以添加一个 shingle 过滤器，但这样会进一步破坏拼写检查索引

 
    <分析器>
        >
        <过滤器类=“solr.LowerCaseFilterFactory”/>
        <过滤器类=“solr.TrimFilterFactory”/>
        <过滤器类=“solr.PatternReplaceFilterFactory”
        模式=“[\-\.\/\(\),]”替换=“”替换=“全部”/>
    <过滤器类=“solr.StopFilterFactory”ignoreCase=“true”字=“spellstopwords.txt”/>                       
        
        <过滤器类=“solr.RemoveDuplicatesTokenFilterFactory”/>

如果您确实需要排序规则（如果您不这样做）不使用 shingle 过滤器，您将需要它）您可以使用 trunk 中的 solr，您可以在其中指定 maxCollationTries=1 以确保返回的更正会产生一些命中
我们使用pellcheck.accuracy=0.7（并且 onlyMorePopular=false）

We have faced same problems for spellchecker producing weird results although we had a lot of data available. I cannot help how to debug it better, but I can tell you what we did:

we are using a text field as it is - no whitespace or standard tokenizer! you can also add a shingle filter if you have less data to index not only "hello rabbit" but also "rabbit hello", but this will blow up the spellcheck index even more

 <fieldType name="txtspell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.TrimFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory"
        pattern="[\-\.\/\(\),]" replacement=""  replace="all"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="spellstopwords.txt"/>                       
        <!-- we don't want duplicates for one doc -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
 </fieldType>

if you really need collation then (if you don't use shingle filter you'll need it) you can use solr from trunk where you can specify maxCollationTries=1 to be sure that the returned correction would produce some hits
we use spellcheck.accuracy=0.7 (and onlyMorePopular=false)

回复收藏 0 原文

~没有更多了~