Solr 使用意外的前缀和后缀突出显示

发布于 2025-01-04 02:21:06 字数 1703 浏览 1 评论 0原文

我需要像这样自定义 Solr 高亮前缀和后缀:

<span class="highlight">text</span>

而不是默认值

<em>text</em>

这就是为什么我在 solrconfig.xml 中为 HighlightComponent 使用此配置:

<searchComponent class="solr.HighlightComponent" name="highlight">
    <highlighting>
        <fragmentsBuilder name="simple" default="true" class="solr.highlight.SimpleFragmentsBuilder">
            <lst name="defaults">
                <str name="hl.tag.pre"><![CDATA[<span class="highlight">]]></str>
                <str name="hl.tag.post"><![CDATA[</span>]]></str>
            </lst>
        </fragmentsBuilder>
    </highlighting>
</searchComponent>

以下是我的标准请求处理程序的默认参数:

<requestHandler name="standard" class="solr.SearchHandler" default="true">
    <lst name="defaults">
        <str name="hl">true</str>
        <str name="hl.fl">body,title</str>
        <str name="hl.useFastVectorHighlighter">true</str>
    </lst>
</requestHandler>

当我搜索 text 单词时,我确实会突出显示文本单词,但并不总是使用我配置的前缀和后缀:

<lst name="highlighting">
    <lst name="document_1">
        <arr name="body">
            <str>my <em>text</em> highlighted</str>
        </arr>
        <arr name="title">
            <str>my <span class="highlight">text</span> highlighted</str>
        </arr>
    </lst>
</lst>

有人知道为什么吗?

I need to customize Solr highlighting prefix and suffix like this:

<span class="highlight">text</span>

instead of the default

<em>text</em>

That's why I'm using this configuration within the solrconfig.xml for the HighlightComponent:

<searchComponent class="solr.HighlightComponent" name="highlight">
    <highlighting>
        <fragmentsBuilder name="simple" default="true" class="solr.highlight.SimpleFragmentsBuilder">
            <lst name="defaults">
                <str name="hl.tag.pre"><![CDATA[<span class="highlight">]]></str>
                <str name="hl.tag.post"><![CDATA[</span>]]></str>
            </lst>
        </fragmentsBuilder>
    </highlighting>
</searchComponent>

The following are the default parameters for my standard request handler:

<requestHandler name="standard" class="solr.SearchHandler" default="true">
    <lst name="defaults">
        <str name="hl">true</str>
        <str name="hl.fl">body,title</str>
        <str name="hl.useFastVectorHighlighter">true</str>
    </lst>
</requestHandler>

When I search for the text word I do get the text word highlighted, but not always using the prefix and suffix I configured:

<lst name="highlighting">
    <lst name="document_1">
        <arr name="body">
            <str>my <em>text</em> highlighted</str>
        </arr>
        <arr name="title">
            <str>my <span class="highlight">text</span> highlighted</str>
        </arr>
    </lst>
</lst>

Does anybody know why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

┈┾☆殇 2025-01-11 02:21:06

我猜您看到这种行为是因为您只为 SimpleFragmentsBuilder 定义了前缀和后缀,而其他亮点来自另一个片段构建器。

我使用自定义前缀和后缀进行突出显示,并在 solrconfig.xml 的 highlighting 部分的 formatter 部分中设置此值,并且没有遇到任何问题它将适用于所有片段构建器。

所以也许可以尝试以下方法:

 <highlighting>
   <fragmentsBuilder name="simple" default="true"
          class="solr.highlight.SimpleFragmentsBuilder"/>
   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
        default="true">
     <lst name="defaults">
       <str name="hl.simple.pre"><![CDATA[<span class="highlight">]]></str>
       <str name="hl.simple.post"><![CDATA[</span>]]></str>
     </lst>
  </formatter>
 </highlighting>

I am guessing you are seeing this behavior behavior because you only have the prefix and suffix defined for the SimpleFragmentsBuilder and the other highlights are coming from another fragment builder.

I am using a custom prefix and suffix for my highlighting and I set this value in the formatter section of the highlighting section of the solrconfig.xml and have not had any issues as it will apply to all fragment builders.

So maybe try the following:

 <highlighting>
   <fragmentsBuilder name="simple" default="true"
          class="solr.highlight.SimpleFragmentsBuilder"/>
   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
        default="true">
     <lst name="defaults">
       <str name="hl.simple.pre"><![CDATA[<span class="highlight">]]></str>
       <str name="hl.simple.post"><![CDATA[</span>]]></str>
     </lst>
  </formatter>
 </highlighting>
囍孤女 2025-01-11 02:21:06

我终于知道为什么了!我正在使用 fastVectorHighlighter 来加快突出显示速度。
一开始我只突出显示 title 字段,一切正常。
当我添加 body 字段以突出显示时,我忘记启用 termVectors =true
现在,

<field name="body" type="text" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

在完全重新索引突出显示完美工作后,我的 body 字段看起来像这样:

<lst name="highlighting">
    <lst name="document_1">
        <arr name="body">
            <str>my <span class="highlight">text</span> highlighted</str>
        </arr>
        <arr name="title">
            <str>my <span class="highlight">text</span> highlighted</str>
        </arr>
    </lst>
</lst>

以前,body 字段突出显示确实有效,但没有 fastVectorHighlighter 因为该字段没有termVectors=true 参数。这就是为什么我用默认前缀和后缀突出显示 body 。由于 fastVectorHighlighter 是一种完全不同的突出显示方法,因此配置也不同。

为了避免这种错误,只要用户可以使用 hl.fl 参数选择要突出显示的字段,我建议还包含标准突出显示的配置(formatter 元素,类 < code>solr.highlight.HtmlFormatter) 像这样:

<searchComponent class="solr.HighlightComponent" name="highlight">
    <highlighting>
        <formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
            <lst name="defaults">
                <str name="hl.simple.pre"><![CDATA[<span class="highlight">]]></str>
                <str name="hl.simple.post"><![CDATA[</span>]]></str>
            </lst>
        </formatter>
        <fragmentsBuilder name="simple" default="true" class="solr.highlight.SimpleFragmentsBuilder">
            <lst name="defaults">
                <str name="hl.tag.pre"><![CDATA[<span class="highlight">]]></str>
                <str name="hl.tag.post"><![CDATA[</span>]]></str>
            </lst>
        </fragmentsBuilder>
    </highlighting>
</searchComponent>

这种方式突出显示将使用相同的前缀和后缀,即使对于禁用 termVectors 的字段也是如此。

I finally found out why! I'm using fastVectorHighlighter to make highlighting faster.
At the beginning I was highlighting only the title field and everything worked fine.
When I added the body field to highlighting I forgot to enable termVectors=true.
Now that my body field looks like this

<field name="body" type="text" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

after a full reindex highlighting is working perfectly:

<lst name="highlighting">
    <lst name="document_1">
        <arr name="body">
            <str>my <span class="highlight">text</span> highlighted</str>
        </arr>
        <arr name="title">
            <str>my <span class="highlight">text</span> highlighted</str>
        </arr>
    </lst>
</lst>

Previously the body field highlighting did work, but without fastVectorHighlighter since the field didn't have the termVectors=true parameter. That's why I got body highlighted with default prefix and suffix. Since fastVectorHighlighter is a completely different highlighting method, the configuration is different as well.

To avoid this kind of mistakes, as long the users can choose what fields to highlight with the hl.fl parameter, I'd recommend to include also the configuration for the standard highlighting (formatter element, class solr.highlight.HtmlFormatter) like this:

<searchComponent class="solr.HighlightComponent" name="highlight">
    <highlighting>
        <formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
            <lst name="defaults">
                <str name="hl.simple.pre"><![CDATA[<span class="highlight">]]></str>
                <str name="hl.simple.post"><![CDATA[</span>]]></str>
            </lst>
        </formatter>
        <fragmentsBuilder name="simple" default="true" class="solr.highlight.SimpleFragmentsBuilder">
            <lst name="defaults">
                <str name="hl.tag.pre"><![CDATA[<span class="highlight">]]></str>
                <str name="hl.tag.post"><![CDATA[</span>]]></str>
            </lst>
        </fragmentsBuilder>
    </highlighting>
</searchComponent>

This way highlighting will work with the same prefix and suffix even for fields with termVectors disabled.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文