使用 JAXB 解组 XML,无需取消转义字符

发布于 2025-01-05 21:08:53 字数 948 浏览 1 评论 0原文

想象一下以下情况:我们从某些外部工具收到一个 xml 文件。最近,在此 xml 中,节点名或其 richcontent 标记中可能存在一些转义字符,如以下示例(简化):

<map>
<node TEXT="Project">
<node TEXT="&#xe4;&#xe4;">
<richcontent TYPE="NOTE"><html>
  <head>

  </head>
  <body>
    <p>
      I am a Note for Node &#228;&#228;!
    </p>
  </body>
</html>
</richcontent>
</node>
</node>
</map>

使用 JAXB 解组文件后,这些转义字符将不再转义。不幸的是,我需要他们保持原样,这意味着逃跑。有什么方法可以避免在解组时取消转义这些字符?

在研究时,我发现了很多有关编组 xml 文件的问题,其中出现了相反的问题,但这些也没有帮助我:

是否有可能使用 JAXB 实现这一目标,或者我们是否可以甚至必须考虑更改为不同的 xml reader API?

先感谢您, 伊梅内

imagine following situation: we receive a xml file from some external tool. Lately within this xml, there can be some escaped charakters in nodenames or within their richcontent tag, like in the following example (simplyfied):

<map>
<node TEXT="Project">
<node TEXT="ää">
<richcontent TYPE="NOTE"><html>
  <head>

  </head>
  <body>
    <p>
      I am a Note for Node ää!
    </p>
  </body>
</html>
</richcontent>
</node>
</node>
</map>

After unmarshalling the file with JAXB those escaped charakters get unescaped. Unfortunatly I need them to stay the way they are, meaning escaped. Is there any way to avoid unescaping those characters while unmarshalling?

While researching I found a lot of questions concerning marshalling xml-files where the opposite problem occurs, but those didnt help me either:

Is it even possible to achieve this aim with JAXB, or do we even have to consider changing to a different xml reader API?

Thank you in advance,
ymene

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

枕梦 2025-01-12 21:08:53

您只需将 &# 替换为 &# 即可调用

unmarshaller.unmarshal(new AmpersandingStream(new FileInputStream(...)));

import java.io.IOException;
import java.io.InputStream;

/**
* Replaces numerical entities with their notation as text.
*/
public class AmpersandingStream extends InputStream {

    private InputStream in;
    private boolean justReadAmpersand;
    private String lookAhead = "";

    public AmpersandingStream(InputStream in) {
        this.in = in;
    }

    @Override
    public int read() throws IOException {
        if (!lookAhead.isEmpty()) {
            int c = lookAhead.codePointAt(0);
            lookAhead = lookAhead.substring(Character.charCount(c));
            return c;
        }
        int c = in.read();
        if (c == (int)'#' && justReadAmpersand) {
            c = (int)'a';
            lookAhead = "mp;#";
        }
        justReadAmpersand = c == (int)'&';
        return c;
    }

    @Override
    public int available() throws IOException {
        return in.available();
    }

    @Override
    public void close() throws IOException {
        in.close();
    }

    @Override
    public synchronized void mark(int readlimit) {
        in.mark(readlimit);
    }

    @Override
    public boolean markSupported() {
        return in.markSupported();
    }

    @Override
    public int read(byte[] b) throws IOException {
        return in.read(b);
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        return in.read(b, off, len);
    }

    @Override
    public synchronized void reset() throws IOException {
        in.reset();
    }

    @Override
    public long skip(long n) throws IOException {
        return in.skip(n);
    }

}

You need only to replace &# by &# hence call

unmarshaller.unmarshal(new AmpersandingStream(new FileInputStream(...)));

and

import java.io.IOException;
import java.io.InputStream;

/**
* Replaces numerical entities with their notation as text.
*/
public class AmpersandingStream extends InputStream {

    private InputStream in;
    private boolean justReadAmpersand;
    private String lookAhead = "";

    public AmpersandingStream(InputStream in) {
        this.in = in;
    }

    @Override
    public int read() throws IOException {
        if (!lookAhead.isEmpty()) {
            int c = lookAhead.codePointAt(0);
            lookAhead = lookAhead.substring(Character.charCount(c));
            return c;
        }
        int c = in.read();
        if (c == (int)'#' && justReadAmpersand) {
            c = (int)'a';
            lookAhead = "mp;#";
        }
        justReadAmpersand = c == (int)'&';
        return c;
    }

    @Override
    public int available() throws IOException {
        return in.available();
    }

    @Override
    public void close() throws IOException {
        in.close();
    }

    @Override
    public synchronized void mark(int readlimit) {
        in.mark(readlimit);
    }

    @Override
    public boolean markSupported() {
        return in.markSupported();
    }

    @Override
    public int read(byte[] b) throws IOException {
        return in.read(b);
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        return in.read(b, off, len);
    }

    @Override
    public synchronized void reset() throws IOException {
        in.reset();
    }

    @Override
    public long skip(long n) throws IOException {
        return in.skip(n);
    }

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文