为什么 Swing Parser 的 handleText 不处理嵌套标签？

发布于 2024-08-08 09:59:22 字数 1396 浏览 7 评论 0原文

我需要转换一些具有嵌套标签的 HTML 文本，以用 css 属性装饰“matches”以突出显示它（如 firefox 搜索）。我不能只进行简单的替换（例如，假设用户搜索“img”），因此我尝试仅在正文文本中进行替换（而不是在标签属性上）。

我有一个非常简单的 HTML 解析器，我认为应该这样做：

final Pattern pat = Pattern.compile(srch, Pattern.CASE_INSENSITIVE);
Matcher m = pat.matcher(output);
if (m.find()) {
    final StringBuffer ret = new StringBuffer(output.length()+100);
    lastPos=0;
    try {
        new ParserDelegator().parse(new StringReader(output.toString()),
        new HTMLEditorKit.ParserCallback () {
            public void handleText(char[] data, int pos) {
                ret.append(output.subSequence(lastPos, pos));
                Matcher m = pat.matcher(new String(data));
                ret.append(m.replaceAll("<span class=\"search\">$0</span>"));
                lastPos=pos+data.length;
            }
        }, false);
        ret.append(output.subSequence(lastPos, output.length()));
        return ret;
    } catch (Exception e) {
 return output;
    }
}
return output;

我的问题是，当我调试它时，handleText 被包含标签的文本调用！就好像它只深入一层。有人知道为什么吗？我需要对 HTMLParser 做一些简单的事情（没有太多使用它）来启用嵌套标签的“正确”行为吗？

PS - 我自己想出来了 - 请参阅下面的答案。简短的回答是，如果您传递 HTML，而不是预先转义的 HTML，它就可以正常工作。哎哟！希望这对其他人有帮助。

<span>example with <a href="#">nested</a> <p>more nesting</p>
</span> <!-- all this gets thrown together -->

原文

I need to transform some HTML text that has nested tags to decorate 'matches' with a css attribute to highlight it (like firefox search).
I can't just do a simple replace (think if user searched for "img" for example), so I'm trying to just do the replace within the body text (not on tag attributes).

I have a pretty straightforward HTML parser that I think should do this:

final Pattern pat = Pattern.compile(srch, Pattern.CASE_INSENSITIVE);
Matcher m = pat.matcher(output);
if (m.find()) {
    final StringBuffer ret = new StringBuffer(output.length()+100);
    lastPos=0;
    try {
        new ParserDelegator().parse(new StringReader(output.toString()),
        new HTMLEditorKit.ParserCallback () {
            public void handleText(char[] data, int pos) {
                ret.append(output.subSequence(lastPos, pos));
                Matcher m = pat.matcher(new String(data));
                ret.append(m.replaceAll("<span class=\"search\">$0</span>"));
                lastPos=pos+data.length;
            }
        }, false);
        ret.append(output.subSequence(lastPos, output.length()));
        return ret;
    } catch (Exception e) {
 return output;
    }
}
return output;

My problem is, when I debug this, the handleText is getting called with text that includes tags! It's like it's only going one level deep. Anyone know why? Is there some simple thing I need to do to HTMLParser (haven't used it much) to enable 'proper' behavior of nested tags?

PS - I figured it out myself - see answer below. Short answer is, it works fine if you pass it HTML, not pre-escaped HTML. Doh! Hope this helps someone else.

<span>example with <a href="#">nested</a> <p>more nesting</p>
</span> <!-- all this gets thrown together -->

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空城仅有旧梦在 2024-08-15 09:59:22

我在 XP 上使用 JDK6 似乎工作得很好。我用 head 和 body 标签包装了您的示例 HTML。我得到三行输出：

a）示例
b) 嵌套
c）更多嵌套

这是我使用的代码：

import java.io.*;
import java.net.*;
import javax.swing.text.html.parser.*;
import javax.swing.text.html.*;

public class ParserCallbackText extends HTMLEditorKit.ParserCallback
{
    public void handleText(char[] data, int pos)
    {
        System.out.println( data );
    }

    public static void main(String[] args)
        throws Exception
    {
        Reader reader = getReader(args[0]);
        ParserCallbackText parser = new ParserCallbackText();
        new ParserDelegator().parse(reader, parser, true);
    }

    static Reader getReader(String uri)
        throws IOException
    {
        // Retrieve from Internet.
        if (uri.startsWith("http:"))
        {
            URLConnection conn = new URL(uri).openConnection();
            return new InputStreamReader(conn.getInputStream());
        }
        // Retrieve from file.
        else
        {
            return new FileReader(uri);
        }
    }
}

Seems to work fine for me using JDK6 on XP. I wrapped your sample HTML with head and body tags. I got three lines of output:

a) example with
b) nested
c) more nesting

Here's the code I used:

import java.io.*;
import java.net.*;
import javax.swing.text.html.parser.*;
import javax.swing.text.html.*;

public class ParserCallbackText extends HTMLEditorKit.ParserCallback
{
    public void handleText(char[] data, int pos)
    {
        System.out.println( data );
    }

    public static void main(String[] args)
        throws Exception
    {
        Reader reader = getReader(args[0]);
        ParserCallbackText parser = new ParserCallbackText();
        new ParserDelegator().parse(reader, parser, true);
    }

    static Reader getReader(String uri)
        throws IOException
    {
        // Retrieve from Internet.
        if (uri.startsWith("http:"))
        {
            URLConnection conn = new URL(uri).openConnection();
            return new InputStreamReader(conn.getInputStream());
        }
        // Retrieve from file.
        else
        {
            return new FileReader(uri);
        }
    }
}

回复收藏 0 原文

╭⌒浅淡时光〆 2024-08-15 09:59:22

很抱歉提出了误导性问题 - 我发现了我的问题，并且它没有包含在我的描述中 - 我的输入字符串已经过预处理，所以我正在查看诸如

<span>example with <a href="#"> nested >/a< >p<more nesting>/p<
</span> <!-- well of course it all gets thrown together -->

Sorry for the misleading question - I found my problem, and it wasn't included in my description - my input string had been pre-processed so I was looking at text such as

<span>example with <a href="#"> nested >/a< >p<more nesting>/p<
</span> <!-- well of course it all gets thrown together -->

回复收藏 0 原文

~没有更多了~

关于作者

强辩

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

为什么 Swing Parser 的 handleText 不处理嵌套标签？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

为什么 Swing Parser 的 handleText 不处理嵌套标签？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。