Solr拼写检查问题
我对 Solr 的拼写检查建议有一个奇怪的问题。
我搜索这样的术语(例如产品编号):08p17a6
使用这个术语,我在索引中找到文档。
我已启用拼写检查=true。因此,除了文档之外,solr 还在 xml 响应中为我提供了拼写检查建议:
<lst name="spellcheck">
<lst name="suggestions">
<lst name="p17a6">
<int name="numFound">1</int>
<int name="startOffset">2</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>08p17a6</str>
</arr>
</lst>
</lst>
</lst>
Solr 采用我的搜索词的第一个数字,并根据“p17a6”为我提供了建议。我不明白他为什么要删去他的建议的前两个数字。
如果我启用pellcheck.collate,事情会变得更加奇怪:
<lst name="spellcheck">
<lst name="suggestions">
<lst name="p17a6">
<int name="numFound">1</int>
<int name="startOffset">2</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>08p17a6</str>
</arr>
</lst>
<str name="collation">0808p17a6</str>
</lst>
</lst>
我需要使用spellcheck.collate来针对多个搜索词提供建议。但正如您所看到的,xml 响应建议我使用“0808p17a6”。
有谁知道这是怎么发生的?
编辑:
这是我关于拼写检查的架构配置:
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true" />
<copyField source="title" dest="spell" />
<copyField source="subTitle" dest="spell" />
<copyField source="content" dest="spell" />
复制字段的源字段配置如下:
<field name="title" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true" />
<field name="subTitle" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true" />
<field name="content" type="text" indexed="true" stored="true" termVectors="true" />
这是分析器的配置: >
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"
/>
<!-- best practice (currently) for synonyms is to add them by
expansions during index time
-->
<filter class="solr.SynonymFilterFactory" synonyms="german/synonyms.txt" ignoreCase="true" expand="true"/>
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory" words="german/stopwords.txt" ignoreCase="true" enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2" protected="german/protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"
/>
<filter class="solr.StopFilterFactory" words="german/stopwords.txt" ignoreCase="true" enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2" protected="german/protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<!-- Setup simple analysis for spell checking -->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="german/synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" words="german/stopwords.txt" ignoreCase="true"/>
<filter class="solr.StandardFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="german/stopwords.txt" ignoreCase="true"/>
<filter class="solr.StandardFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
I have a weird problem with the spellcheck suggestions of Solr.
I search for a term like this (a product-number for example): 08p17a6
With this term, i find documents in my index.
I have enabled spellcheck=true. So besides documents, solr also gives me a spellcheck suggestion in the xml response:
<lst name="spellcheck">
<lst name="suggestions">
<lst name="p17a6">
<int name="numFound">1</int>
<int name="startOffset">2</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>08p17a6</str>
</arr>
</lst>
</lst>
</lst>
Solr takes of the first to numbers of my search term, and gives me a suggestion based on "p17a6". I don't understand why he cuts of the first two numbers for his suggestion.
Things will get more weird, if i enable spellcheck.collate:
<lst name="spellcheck">
<lst name="suggestions">
<lst name="p17a6">
<int name="numFound">1</int>
<int name="startOffset">2</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>08p17a6</str>
</arr>
</lst>
<str name="collation">0808p17a6</str>
</lst>
</lst>
I need to use spellcheck.collate for suggestsions on multiple search terms. But as you can see, the xml response suggests me to use "0808p17a6".
Does anyone know how this happens?
Edit:
Here is my schema configuration regarding the spellcheck:
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true" />
<copyField source="title" dest="spell" />
<copyField source="subTitle" dest="spell" />
<copyField source="content" dest="spell" />
The source fields of the copyfields are configured like this:
<field name="title" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true" />
<field name="subTitle" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true" />
<field name="content" type="text" indexed="true" stored="true" termVectors="true" />
This is the configuration for the analyzers:
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"
/>
<!-- best practice (currently) for synonyms is to add them by
expansions during index time
-->
<filter class="solr.SynonymFilterFactory" synonyms="german/synonyms.txt" ignoreCase="true" expand="true"/>
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory" words="german/stopwords.txt" ignoreCase="true" enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2" protected="german/protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"
/>
<filter class="solr.StopFilterFactory" words="german/stopwords.txt" ignoreCase="true" enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2" protected="german/protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<!-- Setup simple analysis for spell checking -->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="german/synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" words="german/stopwords.txt" ignoreCase="true"/>
<filter class="solr.StandardFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="german/stopwords.txt" ignoreCase="true"/>
<filter class="solr.StandardFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论