如何配置SOLR以便用户默认进行前缀搜索?

发布于 2024-12-05 19:35:20 字数 1736 浏览 1 评论 0原文

我正在使用 SOLR 3.2。我的应用程序在 SOLR 实例上针对文本字段类型发出搜索查询。当用户发出“book”这样的查询时,如何使 SOLR 返回“book”、“bookshelf”、“bookasd”等结果。我应该手动将“*”字符附加到查询字符串中,还是 SOLR 中有一个设置,以便它默认在字段上进行前缀搜索?

这是文本字段类型的 schema.xml 部分:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenat0All="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>

I am using SOLR 3.2. My application issues search queries on SOLR instance, for a text field type. How can i make SOLR to return results like "book", "bookshelf", "bookasd" so on, when user issues a query like "book". Should i append "*" characters to the query string manually or is there a setting in SOLR so it will do prefix searches on the field by default?

This is the schema.xml section for text field type:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenat0All="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

古镇旧梦 2024-12-12 19:35:20

有多种方法可以做到这一点,但从性能角度来看,您可能需要使用 EdgeNgramFilterFacortory

There are several ways to do this, but performance wise you might want to use EdgeNgramFilterFacortory

忘你却要生生世世 2024-12-12 19:35:20

我对一个项目有同样的要求。我必须实施建议。我所做的就是定义这个建议字段类型,

<fieldType class="solr.TextField" name="suggester">
    <analyzer  type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        
        <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    </analyzer>
    <analyzer  type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

我使用了ShingleFilterFactory,因为我需要获得由一个或多个单词组成的建议。

然后我使用分面查询来获取建议。

Facet.Limit=10

Facet.Prefix="书"

Facet.Field="Suggester" //这是我在其中保存数据的 fieldType="suggester" 字段

我知道它使用方面结果但也许它可以解决你的问题。

如果我的或 Jayendra帕蒂尔的回答没有为您提供解决方案,您也可以看看EdgeNGramFilterFactory

I had the same requirement on a project. I had to implement Suggestion. What i did was defining this suggester fieldType

<fieldType class="solr.TextField" name="suggester">
    <analyzer  type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        
        <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    </analyzer>
    <analyzer  type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

I used ShingleFilterFactory because I needed to get suggestion composed of one ore more words.

Then I used faceting queries to get suggestions.

Facet.Limit=10

Facet.Prefix="book"

Facet.Field="Suggester" //this is the field with fieldType="suggester" in which I saved the data

I know it uses facet results but maybe it solves your problem.

If my or Jayendra Patil's answer doesn't provide you a solution you can also take a look at EdgeNGramFilterFactory

素食主义者 2024-12-12 19:35:20

您要么必须通过在搜索词末尾附加通配符来在客户端进行处理。

影响:-

  1. 通配符查询会对性能产生影响
  2. 通配符查询不会经过分析。因此,查询时间分析不会应用于您的搜索词。

另一个选择是使用您需要的处理来实现自定义查询解析器。

You would either have to do the handling on the client side by appending the wildcard characters at the end of the search terms.

The impact :-

  1. Wildcard queries have a performance impact
  2. Wildcard queries do not undergo analysis. So the query time analysis won't be applied to you search terms

The other option is to implement custom query parser with the handling you need.

淡紫姑娘! 2024-12-12 19:35:20

我相信你现在已经明白了这一点,但这里有一个答案:

我通过取最后一项并在最后一项加上通配符加上 OR 来处理这个问题,例如“我最喜欢的书”变成“我+最喜欢的+” (书或书*)”,并会返回“我最喜欢的书架”。无论如何,您可能想对输入进行一些处理(转义等)。

如果您专门寻找与结果开头匹配的文本,那么边缘 n 元语法是最佳选择,但从阅读您的问题来看,您似乎并没有真正要求这样做。

I'm sure you figured this out by now, but just so there's an answer here:

I handled this by taking the last term and putting an OR with the last term plus a wildcard, e.g. "my favorite book" becomes "my+favorite+(book OR book*)", and would return "my favorite bookshelf". You probably want to do some processing on the input anyway (escaping, etc).

If you are specifically looking for the text typed to match the beginning of the result, then edge n-grams are the way to go, but from reading your question it didn't seem you were really asking for that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文