使用 ascii 编码而不是字符实体对 xml 进行编码

发布于 2024-08-19 16:45:09 字数 646 浏览 4 评论 0原文

好吧,这是我的问题。我需要在 Java 中生成 xml 以传递到另一个应用程序。我开始认为使用 org.w3c.dom.Document 会很容易。不幸的是,我需要传递 XML 的应用程序要求像“这样的特殊字符需要编码为 ASCII (") 而不是它们的字符实体 (")。有谁知道这个问题的简单解决方案吗? ?

PS 无法更改目标应用程序

: 因此,假设我的应用程序被赋予以下字符串作为输入:

he will "x" this if needed

我的应用程序需要输出以下内容:

<field value="he will &#034;x&#034; this if needed"/>

我正在使用的 XML 生成器,我猜大多数其他生成器都会输出此内容,但这对我的目标无效:

<field value="he will &quot;x&quot; this if needed"/>

我意识到我的目标可能不太好符合 XML 标准,但这对我没有帮助,因为我无法控制它。这就是我的处境,我必须面对它。除了简单地手动转换每个特殊字符之外,还有什么想法吗?

Alright, so here is my issue. I need to generate xml in Java to pass onto another application. I started off thinking this would be easy using an org.w3c.dom.Document. Unfortunately the application I need to pass the XML off to requires that special characters like " need to be encoded as ASCII (") instead of their character entity ("). Does anybody know a simple solution to this?

P.S. Changing the target application is not an option.

Update:
So let's say my app is given the following string as input:

he will "x" this if needed

My app needs to output this:

<field value="he will "x" this if needed"/>

The XML generator I am using and I am guessing most others output this but this is not valid for my target:

<field value="he will "x" this if needed"/>

I realize my target may not quite be up to XML standards, but that doesn't help me as I have no control over it. This is my situation and I have to deal with it. Any ideas other than simply converting every special character by hand?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小草泠泠 2024-08-26 16:45:09

我想知道如何将 XML 序列化为字符串、流等。您可以对输出进行后处理,将一般实体引用替换为它们的数字等效项,例如

sed 's/</\ </g; s/>/\>/g; s/&/\&/g; s/'/\'/g; s/"/\"/g'

xmlResultString.replaceAll("<", "<"); //ETC。对于其他实体

XML 中正好有 5 个预定义的通用实体 (http://www.w3.org/TR/REC-xml/#sec-predefined-ent),您可以安全地执行此操作作为文本替换。除了引用之外,它不会修改任何内容(好吧,也许在注释和 PI 中,但听起来您的场景并不使用它们,或者目标甚至不接受它们)。

我同意 Mark 的观点,即您的目标应用程序不是符合标准的 XML 处理器。至少它附带的文档明确说明了它与 XML 的不同之处。我相信该建议(上面的链接)不同意克里斯托弗的评论,尽管这与OP的问题无关,因为他的目标声明不遵守该建议。

阿里。

I wonder how you serialize the XML--to a string, a stream, etc. You can post-process your output to replace general entity references with their numeric equivalents, e.g.,

sed 's/</\</g; s/>/\>/g; s/&/\&/g; s/'/\'/g; s/"/\"/g'

or

xmlResultString.replaceAll("<", "<"); //etc. for other entities

There are exactly 5 pre-defined general entities in XML (http://www.w3.org/TR/REC-xml/#sec-predefined-ent) and you can safely perform this as a textual replacement. There is no danger that it modify anything except the references (well, maybe in comments and PIs, but it doesn't sound like your scenario uses them, or that the target even accepts them).

I agree with Mark that your target application is not a conforming XML processor. At least it comes with documentation that states explicitly where it diverges from XML. I believe the Recommendation (link above) disagrees with Christopher's comment, though it's irrelevant to OP's question as his target declares its non-conformance to the Recommendation.

Ari.

橪书 2024-08-26 16:45:09

据我所知,标准 API 没有公开转义机制。您可能需要编写自己的 XML 发射器。

如果您不介意第 3 方 API,则可以使用 JDOM。类似于:

XMLOutputter outputter = new XMLOutputter() {
  @Override
  public String escapeAttributeEntities(String sequence) {
    // TODO: bug: code only works for Basic Multilingual Plane
    StringBuilder out = new StringBuilder();
    for (int i = 0; i < sequence.length(); i++) {
      process(sequence.charAt(i), out);
    }
    return out.toString();
  }

  private void process(char codePoint, StringBuilder out) {
    if (codePoint == '"' || codePoint == '\'' || codePoint == '&'
        || codePoint == '<' || codePoint == '>' || codePoint > 127) {
      out.append("&#");
      out.append(Integer.toString(codePoint));
      out.append(";");
    } else {
      out.append(codePoint);
    }
  }
};
outputter.setFormat(Format.getPrettyFormat().setEncoding("US-ASCII"));

Element foo = new Element("foo").setAttribute("msg",
    "he will \"x\" this if needed");
Document doc = new Document().setRootElement(foo);
outputter.output(doc, System.out);

这会发出:

<?xml version="1.0" encoding="US-ASCII"?>
<foo msg="he will "x" this if needed" />

(我仍然会给出 XML 规范 在执行此操作之前先检查一遍并修复字符处理以支持字符 高于 U+FFFF。)

To my knowledge, the standard API doesn't expose the escape mechanism. You'd probably need to write your own XML emitter.

If you don't mind a 3rd party API, you could use JDOM. Something like:

XMLOutputter outputter = new XMLOutputter() {
  @Override
  public String escapeAttributeEntities(String sequence) {
    // TODO: bug: code only works for Basic Multilingual Plane
    StringBuilder out = new StringBuilder();
    for (int i = 0; i < sequence.length(); i++) {
      process(sequence.charAt(i), out);
    }
    return out.toString();
  }

  private void process(char codePoint, StringBuilder out) {
    if (codePoint == '"' || codePoint == '\'' || codePoint == '&'
        || codePoint == '<' || codePoint == '>' || codePoint > 127) {
      out.append("&#");
      out.append(Integer.toString(codePoint));
      out.append(";");
    } else {
      out.append(codePoint);
    }
  }
};
outputter.setFormat(Format.getPrettyFormat().setEncoding("US-ASCII"));

Element foo = new Element("foo").setAttribute("msg",
    "he will \"x\" this if needed");
Document doc = new Document().setRootElement(foo);
outputter.output(doc, System.out);

This emits:

<?xml version="1.0" encoding="US-ASCII"?>
<foo msg="he will "x" this if needed" />

(I'd still give the XML spec a once-over before doing this and fix up the character handling to support characters above U+FFFF.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文