在 Sunspot (Solr) 中使用不同字段类型索引不同字段

发布于 2024-12-01 02:52:10 字数 1737 浏览 3 评论 0原文

我想设置索引，以便语音匹配结果的权重低于常规匹配的权重。

为此，我在 schema.xml 中为文本创建了两个不同的 fieldType 集：

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
  </analyzer>
</fieldType>
<fieldType name="text_phonetic" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
  </analyzer>
</fieldType>

并创建了一个使用语音工厂的 dynamcicField：

<dynamicField name="*_phonetic" stored="false" type="text_phonetic" multiValued="true" indexed="true"/>

现在在我的模型中我可以执行类似的操作：

text :name, :as => :name_phonetic

并且效果很好。

我的问题是，设置一堆字段以使用常规文本字段索引和语音字段索引，并且对第一个字段有更高的提升，最好的方法是什么？我可以复制模型中的所有索引行，但是有没有办法让我直接在架构中使用构造来执行此操作，并在太阳黑子全文查询中可用？

原文

I'd like to set up my indexing so that phonetic-matched results are given less weight than regular matches.

To do this, I've created two different fieldType sets in my schema.xml for text:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
  </analyzer>
</fieldType>
<fieldType name="text_phonetic" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
  </analyzer>
</fieldType>

and created a dynamcicField that uses the phonetic factory:

<dynamicField name="*_phonetic" stored="false" type="text_phonetic" multiValued="true" indexed="true"/>

Now in my model I can do something like:

text :name, :as => :name_phonetic

and it works fine.

My question is, what is the best way to go about setting up a bunch of fields to use both the regular text field indexing alongside the phonetic one, with a higher boost to the first? I can just duplicate all of my indexing lines in my model, but is there a way for me to do this directly in the schema with the construct and have that available in the sunspot fulltext query?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

祁梦 2024-12-08 02:52:10

正如您所注意到的，您可以为您想要以多种不同方式建立索引的字段复制 searchable 块中的行。我实际上建议这样做，因为您实际上维护了一些更精细的字段（如下所示），并且您有一些不错的 Sunspot 帮助器，例如内联 :boost 选项。

也就是说，您还可以使用 Solr 的 copyField 指令在架构中。它看起来像这样：

<copyField source="source_field" dest="dest_field" maxChars="N" />

源字段名称可能是一个模式，但是您的目标必须是单个字段。此外，我认为目的地必须定义为其自己的字段，而不是与动态字段匹配的名称。

考虑到这些限制，您可以在架构中设置类似的内容：

<fields>
  ...
  <field name="all_text_phonetic" stored="false" type="text_phonetic" multiValued="true" indexed="true"/>
  ...
</fields>

<copyField source="*_text" dest="all_text_phonetic" />
<copyField source="*_texts" dest="all_text_phonetic" />

为了保持字段的粒度，您可以为每个传入字段设置一个 copyField 指令。但是，与在可搜索块中创建单独的行相比，您可能会遇到更多的重复。

所以这是一个折腾。但这些都是你的选择。

As you note, you can duplicate the lines in your searchable block for the fields you'd like indexed in multiple different ways. I'd actually recommend this, since you actually maintain some more granularity of fields (as will be shown below), and you have some nice Sunspot helpers like the inline :boost option.

That said, you can also make use of Solr's copyField directive in the schema. It looks something like this:

<copyField source="source_field" dest="dest_field" maxChars="N" />

The source field name may be a pattern, however your destination must be a single field. Furthermore, I believe the destination must be defined as its own field rather than be a name matched to a dynamicField.

Those constraints considered, you could set up something like this in your schema:

<fields>
  ...
  <field name="all_text_phonetic" stored="false" type="text_phonetic" multiValued="true" indexed="true"/>
  ...
</fields>

<copyField source="*_text" dest="all_text_phonetic" />
<copyField source="*_texts" dest="all_text_phonetic" />

To maintain granularity with your fields, you could set up a copyField directive for each incoming field. But then you'll have arguably more duplication than would be involved in creating separate lines in your searchable block.

So it's a tossup. But those are your options.

回复收藏 0 原文

~没有更多了~