Java JTextPane HTML 编辑器 UTF-8 字符编码

发布于 2024-12-18 18:26:07 字数 517 浏览 1 评论 0 原文

我使用 JTextPane 作为简单的 html 编辑器。

jtp=new JTextPane();
jtp.setContentType("text/html;charset=UTF-8");
jtp.setEditorKit(new HTMLEditorKit());

当我调用 jtp.getText() 时,我得到了很好的 html 代码,其中所有特殊字符都被转义。但我不想转义国家字符(波兰语),而只想转义特殊的 html 字符,例如 &、<、> 当我进入编辑器时,

<foo>ą ś &

我得到了

&lt;foo&gt;&#261; &#347; &amp;

,但我想得到

&lt;foo&gt;ą ś &amp;

它是如何可能的?

I'm using JTextPane as simple html editor.

jtp=new JTextPane();
jtp.setContentType("text/html;charset=UTF-8");
jtp.setEditorKit(new HTMLEditorKit());

When I call jtp.getText() I get nice html code with all special chars escaped. But I don't want escape national characters (polish) but only special html chars like &, <, >
When I enter in editor

<foo>ą ś &

I get

<foo>ą ś &

but I would like get

<foo>ą ś &

How it is possile?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

各空 2024-12-25 18:26:07

不幸的是,这是不可能的。

javax.swing.text.html.HTMLWriter -- 它被硬编码以将任何非 ASCII 的符号转换为其数字表示形式:

default:
    if (chars[counter] < ' ' || chars[counter] > 127) {
        if (counter > last) {
            super.output(chars, last, counter - last);
        }
        last = counter + 1;
        // If the character is outside of ascii, write the
        // numeric value.
        output("&#");
        output(String.valueOf((int)chars[counter]));
        output(";");
    }
    break;
}

无法以任何方式控制此逻辑。

但是如果您真的非常需要该功能,您可以做疯狂的事情

  1. 复制并粘贴HTMLWriter 源到HTMLWriterHack (在同一个包 javax.swing.text.html 中并重命名其中的所有字符串)
  2. 将上面列出的三行 output 替换为类似的内容output(String.valueOf(chars[counter]));
  3. 复制并粘贴 HTMLDocument 源代码放入 HTMLDocumentHack (在同一个包 javax.swing.text.html 中,重命名里面的所有字符串,使其扩展HTMLDocument 并删除冲突方法)
  4. 使用下面列出的 CustomEditorKit 而不是 HTMLEditorKit

class CustomEditorKit extends HTMLEditorKit {
    @Override
    public void write(Writer out, Document doc, int pos, int len) throws IOException, BadLocationException {
        HTMLWriterHack writer = new HTMLWriterHack(out, (HTMLDocumentHack) doc);
        writer.write();
    }
    @Override
    public Document createDefaultDocument() {
        StyleSheet styles = getStyleSheet();
        StyleSheet ss = new StyleSheet();
        ss.addStyleSheet(styles);
        HTMLDocumentHack doc = new HTMLDocumentHack(ss);
        doc.setParser(getParser());
        doc.setAsynchronousLoadPriority(4);
        doc.setTokenThreshold(100);
        return doc;
    }
}

尽管上述步骤有效(我测试过),但我当然不建议这样做。

That's not possible, unfortunately.

There's a flaw inside javax.swing.text.html.HTMLWriter -- it is hardcoded to convert any symbol that is not ASCII to its numeric representation:

default:
    if (chars[counter] < ' ' || chars[counter] > 127) {
        if (counter > last) {
            super.output(chars, last, counter - last);
        }
        last = counter + 1;
        // If the character is outside of ascii, write the
        // numeric value.
        output("&#");
        output(String.valueOf((int)chars[counter]));
        output(";");
    }
    break;
}

This logic cannot be controlled in any way.

BUT If you really really need that functionality you could do the crazy stuff:

  1. copy and paste HTMLWriter sources into HTMLWriterHack (in the same package javax.swing.text.html and renaming all strings inside)
  2. Replace the above listed three output lines with something like output(String.valueOf(chars[counter]));
  3. copy and paste HTMLDocument sources into HTMLDocumentHack (in the same package javax.swing.text.html, renaming all strings inside, making it extend HTMLDocument and removing clashing methods)
  4. Use the CustomEditorKit listed below instead of HTMLEditorKit

class CustomEditorKit extends HTMLEditorKit {
    @Override
    public void write(Writer out, Document doc, int pos, int len) throws IOException, BadLocationException {
        HTMLWriterHack writer = new HTMLWriterHack(out, (HTMLDocumentHack) doc);
        writer.write();
    }
    @Override
    public Document createDefaultDocument() {
        StyleSheet styles = getStyleSheet();
        StyleSheet ss = new StyleSheet();
        ss.addStyleSheet(styles);
        HTMLDocumentHack doc = new HTMLDocumentHack(ss);
        doc.setParser(getParser());
        doc.setAsynchronousLoadPriority(4);
        doc.setTokenThreshold(100);
        return doc;
    }
}

Although the steps above work (I tested it), I certainly wouldn't recommend doing that.

清晰传感 2024-12-25 18:26:07

这是不可能的,代码127以上的所有字符都被转换为数字实体& # 号;。 HTML 实体被转换为命名实体&等等。因此您可以轻松地重新替换它们。 (这是在 HTMLWriter.output 中完成的,并且似乎没有提供任何字符集。)

It is not possible, all characters above code 127 are translated to a numeric entity & # number ;. The HTML-entities are translated into named entities & lt ; , and so on. So you may easily resubstitute them. (This is done in HTMLWriter.output, and there seems to be no provision for character sets whatsoever.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文