Java XSLT 输出在浏览器中错误地显示 A0、B7 字符
我有一个基于 Java/XML/XSL 的 Web 应用程序,它从另一个服务获取 XML 文档,然后显示主文本模式。
两个站点都显示相同的文本块(可以包括 HTML 格式、英语/法语)。主站点显示正常,但我的站点显示某些字符不正确。除这少数内容外,包括法语字符在内的所有内容均能正确显示。
检查文档,我看到 A0、B7,未正确显示。
搜索此网站我发现了这个问题/响应:
ED A0 80 ED B0 80 是有效的 UTF-8 字节序列吗?
在接受的答案中,非法 UTF-8 被解释为Windows-1252。他所展现的那些人物正是我所看到的。
据我所知,该文档以 UTF-8 格式发送到我的网站(如果重要的话,来自基于 .NET 的 Web 应用程序),我们按原样存储它,并按原样显示。它存储为 XML 文档并进行转换以显示输出。
该块显示为禁用输出转义(以便显示 HTML 格式),并且看起来工作正常。
理想情况下,我能够按预期显示这些字符(A0 是空格),以便我的输出看起来与父站点相同。
任何帮助或建议表示赞赏。
I have a Java/XML/XSL based web application that takes an XML document from another service and we then display the main text mode.
Both sites show the same block of text (which can include HTML formatting, English/French). The main site displays fine, but my site is displaying certain characters incorrectly. All content including the french characters display correctly except these few.
Inspecting the document I see A0, B7, not showing correctly.
Searching this site I found this question/response:
Is ED A0 80 ED B0 80 a valid UTF-8 byte sequence?
In the accepted answer it takes about illegal UTF-8 being interpreted as Windows-1252. Those characters he shows are the ones I'm seeing.
As far as I know the document comes to my site UTF-8 (from a .NET based web app if that matters) and we store it as such, and display back as such. It's stored as an XML document and transformed to show the output.
The block is displayed with disable-output-escaping (so that the HTML formatting shows) and that appears to be working correctly.
Ideally I would be able to display these characters as they were intended (A0 being a space) so that my output looks the same as a parent site.
Any help or advice is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
XSLT 处理器(使用一种编码 A 输出结果)和显示软件(渲染文档并相信其编码是 B)之间基本上存在误解。您没有提供足够的信息来确定 A 和 A 是什么B 是;而且您还没有具体说明“显示软件”,我怀疑它是网络服务器和浏览器的组合。检查内容中指定的编码(XML 声明或 HTML 字符集声明)、HTTP 标头中指定的编码以及字节的实际编码是否一致。
There's basically a misunderstanding between the XSLT processor, which is outputting the result using one encoding A, and the display software, which is rendering the document in the belief that its encoding is B. You haven't given enough information to determine what A and B are; and you haven't been specific about the "display software", which I suspect is the combination of a web server and a browser. Check that the encoding specified in the content (XML declaration or HTML charset declaration), the encoding specified in the HTTP header, and the actual encoding of the bytes are all consistent with each other.
您提供的字符字节(
ED A0 80
和ED B0 80
)是所谓代理项的 unicode 字符,它们始终成对出现。请参阅维基百科 unicode 代理。有关 unicode 代理字符子集,请参阅Unicode 概述。
下一步是弄清楚他们是如何到达那里的:-)
The character bytes you supplied (
ED A0 80
andED B0 80
) are unicode characters for so called surrogates, which always appear in pairs.See Wikipedia unicode surrogates. For unicode surrogate character subset see Unicode overview.
Next step is figuring out how they got there :-)