为什么 Swing Parser 的 handleText 不处理嵌套标签?
我需要转换一些具有嵌套标签的 HTML 文本,以用 css 属性装饰“matches”以突出显示它(如 firefox 搜索)。 我不能只进行简单的替换(例如,假设用户搜索“img”),因此我尝试仅在正文文本中进行替换(而不是在标签属性上)。
我有一个非常简单的 HTML 解析器,我认为应该这样做:
final Pattern pat = Pattern.compile(srch, Pattern.CASE_INSENSITIVE);
Matcher m = pat.matcher(output);
if (m.find()) {
final StringBuffer ret = new StringBuffer(output.length()+100);
lastPos=0;
try {
new ParserDelegator().parse(new StringReader(output.toString()),
new HTMLEditorKit.ParserCallback () {
public void handleText(char[] data, int pos) {
ret.append(output.subSequence(lastPos, pos));
Matcher m = pat.matcher(new String(data));
ret.append(m.replaceAll("<span class=\"search\">$0</span>"));
lastPos=pos+data.length;
}
}, false);
ret.append(output.subSequence(lastPos, output.length()));
return ret;
} catch (Exception e) {
return output;
}
}
return output;
我的问题是,当我调试它时,handleText 被包含标签的文本调用!就好像它只深入一层。有人知道为什么吗?我需要对 HTMLParser 做一些简单的事情(没有太多使用它)来启用嵌套标签的“正确”行为吗?
PS - 我自己想出来了 - 请参阅下面的答案。简短的回答是,如果您传递 HTML,而不是预先转义的 HTML,它就可以正常工作。哎哟!希望这对其他人有帮助。
<span>example with <a href="#">nested</a> <p>more nesting</p>
</span> <!-- all this gets thrown together -->
I need to transform some HTML text that has nested tags to decorate 'matches' with a css attribute to highlight it (like firefox search).
I can't just do a simple replace (think if user searched for "img" for example), so I'm trying to just do the replace within the body text (not on tag attributes).
I have a pretty straightforward HTML parser that I think should do this:
final Pattern pat = Pattern.compile(srch, Pattern.CASE_INSENSITIVE);
Matcher m = pat.matcher(output);
if (m.find()) {
final StringBuffer ret = new StringBuffer(output.length()+100);
lastPos=0;
try {
new ParserDelegator().parse(new StringReader(output.toString()),
new HTMLEditorKit.ParserCallback () {
public void handleText(char[] data, int pos) {
ret.append(output.subSequence(lastPos, pos));
Matcher m = pat.matcher(new String(data));
ret.append(m.replaceAll("<span class=\"search\">$0</span>"));
lastPos=pos+data.length;
}
}, false);
ret.append(output.subSequence(lastPos, output.length()));
return ret;
} catch (Exception e) {
return output;
}
}
return output;
My problem is, when I debug this, the handleText is getting called with text that includes tags! It's like it's only going one level deep. Anyone know why? Is there some simple thing I need to do to HTMLParser (haven't used it much) to enable 'proper' behavior of nested tags?
PS - I figured it out myself - see answer below. Short answer is, it works fine if you pass it HTML, not pre-escaped HTML. Doh! Hope this helps someone else.
<span>example with <a href="#">nested</a> <p>more nesting</p>
</span> <!-- all this gets thrown together -->
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我在 XP 上使用 JDK6 似乎工作得很好。我用 head 和 body 标签包装了您的示例 HTML。我得到三行输出:
a)示例
b) 嵌套
c)更多嵌套
这是我使用的代码:
Seems to work fine for me using JDK6 on XP. I wrapped your sample HTML with head and body tags. I got three lines of output:
a) example with
b) nested
c) more nesting
Here's the code I used:
很抱歉提出了误导性问题 - 我发现了我的问题,并且它没有包含在我的描述中 - 我的输入字符串已经过预处理,所以我正在查看诸如
Sorry for the misleading question - I found my problem, and it wasn't included in my description - my input string had been pre-processed so I was looking at text such as