当前位置：文江博客话题详情

Solr 停用词显示在方面搜索结果中

发布于 2024-11-08 15:54:22 字数 246 浏览 10 评论 0原文

我目前正在 Solr 架构中的文本字段上测试分面搜索，并注意到我在 stopwords.txt 文件中获得了大量结果。

我的架构当前使用文本数据类型的默认配置，并且我的印象是，如果使用“solr.StopFilterFactory”过滤器，则不会对停用词建立索引。

我希望有人能够阐明这一点，或者a）帮助我理解为什么停用词不适用于构面以及如何忍受它，或者b）为我指明正确的方向，这样我的构面查询就不会返回来自停用词的单词。

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过去的过去 2024-11-15 15:54:22

停用词确实适用于方面。换句话说：如果您请求已使用停用词索引的字段的某个方面，您不应该在该方面看到任何停用词。

我的猜测是，您没有按照您的想法建立索引：您的 schema.xml 是错误的，或者您在与您想象的不同的字段中建立索引。

我在这个领域使用方面并且效果很好：

<fieldType name="text_ws_stop" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.StopFilterFactory"
    ignoreCase="true"
            words="stopwords_spanish.txt"
            enablePositionIncrements="true"
    />
  </analyzer>
</fieldType>

...

<field name="phrases" type="text_ws_stop" indexed="true" stored="true" required="false"/>

Stopwords do apply to facets. In other words: if you ask for a facet of a field that has been indexed with stopwords you should not see any stopwords in the facet.

My guess is that you are not indexing the way you think: either your schema.xml is wrong or you are indexing in a different field than you think.

I am using facets on this field and works well:

<fieldType name="text_ws_stop" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.StopFilterFactory"
    ignoreCase="true"
            words="stopwords_spanish.txt"
            enablePositionIncrements="true"
    />
  </analyzer>
</fieldType>

...

<field name="phrases" type="text_ws_stop" indexed="true" stored="true" required="false"/>

回复收藏 0 原文

~没有更多了~