Lucene.NET 搜索突出显示尊重 HTML 标签

发布于 2024-08-03 04:35:00 字数 792 浏览 4 评论 0原文

我试图在 HTML 块中突出显示搜索词,问题是如果用户搜索“颜色”,则: <

span style='color: White'>White;

变成: 白色

显然,打乱我的风格并不是一个好主意。

这是我正在使用的代码:

        Query parsedQuery = parser.Parse(luceneQuery);
        StandardAnalyzer Analyzer = new StandardAnalyzer();
        SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b class='search'>", "</b>");

        QueryScorer scorer = new QueryScorer(parsedQuery);
        Highlighter highlighter = new Highlighter(formatter, scorer);

        highlighter.SetTextFragmenter(new SimpleFragmenter());
        Highlighter.GetBestFragment(Analyzer, propertyName, invocation.ReturnValue.ToString())

我猜问题是我需要一个不同的碎片机,但我不确定。任何帮助将不胜感激。

I am trying to highlight search terms in a block of HTML, the problem is if a user does a search for "color", this:

<span style='color: white'>White</span>

becomes:
<span style='<b>color</b>: white'><b>White</b></span>

and obviously, messing up my style is not a good idea.

Here is the code I am using:

        Query parsedQuery = parser.Parse(luceneQuery);
        StandardAnalyzer Analyzer = new StandardAnalyzer();
        SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b class='search'>", "</b>");

        QueryScorer scorer = new QueryScorer(parsedQuery);
        Highlighter highlighter = new Highlighter(formatter, scorer);

        highlighter.SetTextFragmenter(new SimpleFragmenter());
        Highlighter.GetBestFragment(Analyzer, propertyName, invocation.ReturnValue.ToString())

I'm guessing the problem is that i need a different Fragmenter, but I'm not sure. Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

赤濁 2024-08-10 04:35:00

我想我明白了...

我将 StandardAnalyzer 子类化并将 TokenStream 更改为:

public override Lucene.Net.Analysis.TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
    {
        var start = base.TokenStream(fieldName, reader);
        HtmlStripCharFilter filter = new HtmlStripCharFilter(reader);
        TokenStream result = new StandardFilter(filter);
        return new StopFilter(new LowerCaseFilter(result), this.stopSet);
    }

并将 HtmlStripCharFilter 实现为:

public class HtmlStripCharFilter : Lucene.Net.Analysis.CharTokenizer
{
    private bool inTag = false;

    public HtmlStripCharFilter(TextReader input)
        : base(input)
    {
    }

    protected override bool IsTokenChar(char c)
    {
        if (c == '<' && inTag == false)
        {
            inTag = true;
            return false;
        }
        if (c == '>' && inTag)
        {
            inTag = false;
            return false;
        }

        return !inTag && !Char.IsWhiteSpace(c);
    }
}

它朝着正确的方向前进,但在完成之前仍然需要做更多的工作。如果有人有更好的解决方案(阅读“经过测试的”解决方案),我很想听听。

I think I figured it out...

I subclassed StandardAnalyzer and changed TokenStream to this:

public override Lucene.Net.Analysis.TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
    {
        var start = base.TokenStream(fieldName, reader);
        HtmlStripCharFilter filter = new HtmlStripCharFilter(reader);
        TokenStream result = new StandardFilter(filter);
        return new StopFilter(new LowerCaseFilter(result), this.stopSet);
    }

and Implemented HtmlStripCharFilter as:

public class HtmlStripCharFilter : Lucene.Net.Analysis.CharTokenizer
{
    private bool inTag = false;

    public HtmlStripCharFilter(TextReader input)
        : base(input)
    {
    }

    protected override bool IsTokenChar(char c)
    {
        if (c == '<' && inTag == false)
        {
            inTag = true;
            return false;
        }
        if (c == '>' && inTag)
        {
            inTag = false;
            return false;
        }

        return !inTag && !Char.IsWhiteSpace(c);
    }
}

It's headed in the right direction, but still needs a lot more work before it's done. If anyone has a better solution (read "TESTED" solution) I would love to hear it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文