奇怪的 xml/html 重音问题

发布于 2024-08-27 07:37:16 字数 766 浏览 12 评论 0原文

我有一个 XML 文件,其中包含一条带有 html 标签的消息。 XML 文件由 java 类读取,并将其邮寄给人们。收到邮件时,不会显示重音符号。例如 é 不显示。

我已经在 xml 中尝试过 é 但它在 eclipse 中给出了一个错误,指出该实体尚未声明。

我还尝试简单地插入 é 但最终输出中没有显示任何内容。

我尝试的第三件事是使用 但这破坏了解析器,因为它之后没有输出任何内容。

然而我注意到一些奇怪的事情。当我在 xml 中添加类似的内容并添加 UTF-16 编码时,

<message>text bla bla blaa é&lt; 

它确实在末尾输出了 é,如下所示 bla bla blaa blaa é。

编辑 text bla bla blaa éé&lt; 输出 ?é 或仅一个 é

该文件看起来像这样:

<?xml version="1.0"? encoding="UTF-16">

<message>
&lt;b&gt;hello é &lt;/b&gt;
</message>
</xml>

什么给出?

I have an XML file that contains a message with html tags in it. The XML file is read by a java class that mails it to people. When the mail is received, the accents do not show. For example é doesn't show.

I have tried é in the xml but it gives an error in eclipse saying that the entity has not been declared.

I also tried simply inserting é but that shows nothing in the final output.

The 3rd thing I tried was using <![CDATA[é]]> but that broke the parser since it didn't output anything after it.

However I noticed something weird. When i put something like this in the xml and added UTF-16 encoding

<message>text bla bla blaa é< 

it did ouput the é at the end like this bla bla blaa blaa é.

EDIT
<message>text bla bla blaa éé< outputs ?é or just one é

The file looks something like this:

<?xml version="1.0"? encoding="UTF-16">

<message>
<b>hello é </b>
</message>
</xml>

What gives?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

反目相谮 2024-09-03 07:37:16

您是否尝试过将编码更改为UTF-8?

Did you try,change the encoding to UTF-8?

断舍离 2024-09-03 07:37:16

您在标签中提供的编码密钥必须与用于在硬盘上编辑和保存 xml 文件的“真实”编码一致。

如果您在某些欧洲国家/地区使用记事本在 Windows 下编辑 xml 文件,它肯定会以 cp1252 进行编码(Windows 在这种情况下使用的默认编码,请注意 cp1252 是规范化 ISO8859-1 的轻微变体,以包含欧元符号)。

事实上,我建议使用一个编辑工具,它允许您准确控制在编辑/保存操作期间使用哪种编码(例如 http: //jedit.org),这样您就可以保证有效文件编码与其内容中给定的编码(即标签中的编码)相同。

编辑
它还很大程度上取决于您的 java 程序读取 xml 文件并使用它的方式。
如果使用xml解析器应该没问题。否则,您可能必须使用 ISO-8859-1 编码来存储文件,因为它是 java 使用的默认读取编码。如果您非常不幸,并且 java 类中的文件读取过程使用了另一种编码,那么您必须遵守该...

编辑 2
它还取决于邮件客户端及其管理编码的方式......

The encoding key that you provide in the tag MUST be consistent with the "real" encoding which has been used to edit and save the xml file on your harddrive.

If you edited your xml file in some european country under windows with notepad, it will surely be encoded in cp1252 (the default encoding used by windows in such situation, noting that cp1252 is a slight variant of normalized ISO8859-1 to include the euro sign).

In fact I would suggest to use an editing tool which allows you to control accurately which encoding to be used during edit/save operations (like http://jedit.org) so you can guarantee that the effective file encoding and the given encoding in its content (so to say in tag) are the same.

EDIT
It also depends greatly on the way your java program reads the xml file and uses it.
If an xml parser is used, it should be ok. Otherwise you'll probably have to use ISO-8859-1 encoding to store the file as it is the default read encoding used by java. If you're very unlucky and another encoding is used for the file reading process in the java class, well you'll have to comply to that...

EDIT 2
It also depends on the mail client and the way it manages encoding...

泛泛之交 2024-09-03 07:37:16

é 实体是 xml 解析器尝试解释的 html 实体。将 é 替换为 é,xml 解析器将仅解释生成 html 实体的 &你想要的。

关于 UTF-16 编码,这里缺少的关键信息是文件的编码。听起来该文件以 UTF-16 格式保存,没有字节顺序标记,这可以解释为什么它只适用于指定的编码。您可以通过检查文件大小来验证这一点:它将是文件中字符数的两倍(如果您使用某些 unicode 字符,则可能会多一点)。您可以尝试的其他可能的编码是 UTF-8 和 iso-8859-1。

The é entity is an html entity that your xml parser is trying to interpret. Replace é with &eacute; and the xml parser will only interpret the & which generates the html entity you want.

Regarding the UTF-16 encoding, the key piece of information missing here is the encoding of the file. Sounds like the file is being saved in UTF-16 format without a byte-order mark, which would explain why it only works with that encoding specified. You can verify this by checking the file size: it will be twice the number of characters in the file (or possibly a bit more if you're using certain unicode characters). Other likely encodings you can try are UTF-8 and iso-8859-1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文