带有特殊字符的xml,编码utf-8
我有几个简单的问题,因为我在阅读所有不同的回复时感到困惑。
1) 如果我有一个带有 prolog 的 xml: 并且我将使用 Java 对其进行解组(例如:贾XB)。我想,我不能把洛林十字 (http:/ /www.fileformat.info/info/unicode/char/2628/index.htm) 里面,但我可以放“\u2628”,对吗?
2)我也听说UTF-8不包含它,但是Unicode中的任何内容都可以通过编码UTF-8(或UTF-16)来保存,这是本页的一个示例:
UTF-8(十六进制) ) 0xE2 0x98 0xA8 (e298a8)
我的推理正确吗?我可以使用这种形式并将其以utf-8编码放入xml中吗?
I have a few simple questions, because I got confused reading all difference responses.
1) If I have an xml with prolog: <?xml version="1.0" encoding="utf-8" ?>
and I'm going to unmarshall it with Java (for example: JaXB). I suppose, that I can't put CROSS OF LORRAINE (http://www.fileformat.info/info/unicode/char/2628/index.htm) inside, but I can put "\u2628", correct?
2) I've also heard that UTF-8 doesn't contain it, but anything in Unicode can be saved with encoding UTF-8 (or UTF-16), and here is an example from this page:
UTF-8 (hex) 0xE2 0x98 0xA8 (e298a8)
Is my reasoning correct? Can I use this form and put it in the xml with utf-8 encoding?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果你的prolog指定xml的utf-8编码:
那么你可以直接使用utf-8字符,或者你可以将它们编码为☨
If your prolog specifying utf-8 encoding for xml:
then you can use utf-8 characters directly, or you can encode them as ☨
应该没问题——UTF-8 可以编码任何 Unicode 字符。
XML 对控制字符(U+0000 到 U+001F)有一些限制,但 U+2628 应该没问题。
(我个人更喜欢去 unicode.org 获取明确的代码图表,但 U+2628 肯定会出现 此处。)
您不必担心 UTF-8 方面的问题 - 您应该能够直接将字符放入数据中,并让 JAXB 进行编码。
It should be absolutely fine - UTF-8 can encode any Unicode character.
XML has some restrictions around control characters (U+0000 to U+001F) but U+2628 should be fine.
(Personally I prefer to go to unicode.org for definitive code charts, but U+2628 definitely appears here.)
You shouldn't need to worry about the UTF-8 side of things - you should be able to put the character in your data directly, and let JAXB do the encoding.
1 个补充...
仅在序言中指定编码是不够的。您需要确保使用正确的编码对内容进行序列化。
1 more addition...
just specifying the encoding in the prolog is not sufficient. u need to make sure the content is serialized using correct encoding.