Java EE Web 项目和字符编码
我们构建了一个 java ee web 项目并使用 jdbc 来存储我们的数据。 问题是像 äöü 这样的德语“Umlaute”正在使用并正确存储在 mysql 数据库中。我们不知道为什么,但在浏览器中这些字符被破坏,显示出奇怪的东西,例如
ö�
。 我已经尝试设置 jdbc 连接的编码,如本问题中所述:
并且html 页面的编码设置正确:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
有什么解决办法吗?
更新
connection.prepareStatement("SET CHARACTER SET utf8").execute();
不会使元音变音起作用。 将元标记更改为
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
也不会改变任何内容
we built a java ee web project and use jdbc for storing our data.
The problem is that German 'Umlaute' like äöü are in use and properly stored in the mysql database. We don't know why, but in the browser those characters are broken, displaying weird stuff like
ö�
instead.
I've already tried setting the encoding of the jdbc connection like described in this question:
And the encoding of the html page is correctly set:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
Any ideas how to fix that?
Update
connection.prepareStatement("SET CHARACTER SET utf8").execute();
won't make umlauts work.
changing the meta-tag to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
won't change anything, too
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,这是首先要找出的事情。您应该在每个阶段跟踪数据:
记录时,不要只记录字符串:将组成字符串的 Unicode 字符记录为整数。只需将字符串中的每个字符转换为整数并记录它。它很原始,但它会告诉您需要了解的信息。
当然,当您查看线路时,您会看到字节而不是字符本身。您应该计算出您所选择的编码所期望的字节,并根据网络实际传输的字节进行检查。
您已经在 HTML 中指定了编码 - 但是您是否告诉生成页面的任何内容您希望使用 ISO Latin 1 编码?这可能负责设置内容类型标头和执行从文本到字节的实际转换。
此外,您使用 ISO Latin 1 而不是 UTF-8 是否有任何原因?为什么要刻意这样限制自己呢? (ISO Latin 1 只能处理 Unicode 的前 256 个字符,而不是整个范围的 Unicode 字符。UTF-8 可以处理所有内容,并且与 ASCII 一样高效。)
Well, that's the first thing to find out. You should trace your data at every stage:
When you log, don't just log the strings: log the Unicode characters that make up the strings, as integers. Just cast each character in the string to an integer and log it. It's primitive, but it'll tell you what you need to know.
When you look on the wire, of course, you'll be seeing bytes rather than characters as such. You should work out what bytes you expect for your chosen encoding, and check those against what's actually coming across the network.
You've specified the encoding in the HTML - but have you told whatever's generating your page that you want it in ISO Latin 1? That's likely to be responsible for both setting the content-type header and performing the actual conversion from text to bytes.
Additionally, is there any reason why you're using ISO Latin 1 instead of UTF-8? Why would you deliberately restrict yourself like that? (ISO Latin 1 can only handle the first 256 characters of Unicode, instead of the full range of Unicode characters. UTF-8 can handle everything, and is just as efficient for ASCII.)