在 Sunspot (Solr) 中使用不同字段类型索引不同字段
我想设置索引,以便语音匹配结果的权重低于常规匹配的权重。
为此,我在 schema.xml 中为文本创建了两个不同的 fieldType 集:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
</analyzer>
</fieldType>
<fieldType name="text_phonetic" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
</analyzer>
</fieldType>
并创建了一个使用语音工厂的 dynamcicField:
<dynamicField name="*_phonetic" stored="false" type="text_phonetic" multiValued="true" indexed="true"/>
现在在我的模型中我可以执行类似的操作:
text :name, :as => :name_phonetic
并且效果很好。
我的问题是,设置一堆字段以使用常规文本字段索引和语音字段索引,并且对第一个字段有更高的提升,最好的方法是什么?我可以复制模型中的所有索引行,但是有没有办法让我直接在架构中使用构造来执行此操作,并在太阳黑子全文查询中可用?
I'd like to set up my indexing so that phonetic-matched results are given less weight than regular matches.
To do this, I've created two different fieldType sets in my schema.xml for text:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
</analyzer>
</fieldType>
<fieldType name="text_phonetic" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
</analyzer>
</fieldType>
and created a dynamcicField that uses the phonetic factory:
<dynamicField name="*_phonetic" stored="false" type="text_phonetic" multiValued="true" indexed="true"/>
Now in my model I can do something like:
text :name, :as => :name_phonetic
and it works fine.
My question is, what is the best way to go about setting up a bunch of fields to use both the regular text field indexing alongside the phonetic one, with a higher boost to the first? I can just duplicate all of my indexing lines in my model, but is there a way for me to do this directly in the schema with the construct and have that available in the sunspot fulltext query?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如您所注意到的,您可以为您想要以多种不同方式建立索引的字段复制
searchable
块中的行。我实际上建议这样做,因为您实际上维护了一些更精细的字段(如下所示),并且您有一些不错的 Sunspot 帮助器,例如内联:boost
选项。也就是说,您还可以使用 Solr 的
copyField
指令在架构中。它看起来像这样:源字段名称可能是一个模式,但是您的目标必须是单个字段。此外,我认为目的地必须定义为其自己的字段,而不是与动态字段匹配的名称。
考虑到这些限制,您可以在架构中设置类似的内容:
为了保持字段的粒度,您可以为每个传入字段设置一个
copyField
指令。但是,与在可搜索
块中创建单独的行相比,您可能会遇到更多的重复。所以这是一个折腾。但这些都是你的选择。
As you note, you can duplicate the lines in your
searchable
block for the fields you'd like indexed in multiple different ways. I'd actually recommend this, since you actually maintain some more granularity of fields (as will be shown below), and you have some nice Sunspot helpers like the inline:boost
option.That said, you can also make use of Solr's
copyField
directive in the schema. It looks something like this:The source field name may be a pattern, however your destination must be a single field. Furthermore, I believe the destination must be defined as its own
field
rather than be a name matched to adynamicField
.Those constraints considered, you could set up something like this in your schema:
To maintain granularity with your fields, you could set up a
copyField
directive for each incoming field. But then you'll have arguably more duplication than would be involved in creating separate lines in yoursearchable
block.So it's a tossup. But those are your options.