Java EE Web 项目和字符编码

发布于 2024-10-15 04:20:41 字数 704 浏览 6 评论 0原文

我们构建了一个 java ee web 项目并使用 jdbc 来存储我们的数据。 问题是像 äöü 这样的德语“Umlaute”正在使用并正确存储在 mysql 数据库中。我们不知道为什么,但在浏览器中这些字符被破坏,显示出奇怪的东西,例如

ö�

。 我已经尝试设置 jdbc 连接的编码,如本问题中所述:

JDBC 字符编码

并且html 页面的编码设置正确:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

有什么解决办法吗?


更新

connection.prepareStatement("SET CHARACTER SET utf8").execute();

不会使元音变音起作用。 将元标记更改为

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

也不会改变任何内容

we built a java ee web project and use jdbc for storing our data.
The problem is that German 'Umlaute' like äöü are in use and properly stored in the mysql database. We don't know why, but in the browser those characters are broken, displaying weird stuff like

ö�

instead.
I've already tried setting the encoding of the jdbc connection like described in this question:

JDBC character encoding

And the encoding of the html page is correctly set:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

Any ideas how to fix that?


Update

connection.prepareStatement("SET CHARACTER SET utf8").execute();

won't make umlauts work.
changing the meta-tag to

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

won't change anything, too

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

2024-10-22 04:20:41

“我们不知道为什么,但在浏览器中这些字符被破坏了”

好吧,这是首先要找出的事情。您应该在每个阶段跟踪数据:

  • 当您从数据库中提取数据时(带有日志记录)
  • 当您将其注入页面时(带有日志记录)
  • 在网络上(通过 Wireshark)

记录时,不要只记录字符串:将组成字符串的 Unicode 字符记录为整数。只需将字符串中的每个字符转换为整数并记录它。它很原始,但它会告诉您需要了解的信息。

当然,当您查看线路时,您会看到字节而不是字符本身。您应该计算出您所选择的编码所期望的字节,并根据网络实际传输的字节进行检查。

您已经在 HTML 中指定了编码 - 但是您是否告诉生成页面的任何内容您希望使用 ISO Latin 1 编码?这可能负责设置内容类型标头执行从文本到字节的实际转换。

此外,您使用 ISO Latin 1 而不是 UTF-8 是否有任何原因?为什么要刻意这样限制自己呢? (ISO Latin 1 只能处理 Unicode 的前 256 个字符,而不是整个范围的 Unicode 字符。UTF-8 可以处理所有内容,并且与 ASCII 一样高效。)

"We don't know why, but in the browser those characters are broken"

Well, that's the first thing to find out. You should trace your data at every stage:

  • As you fetch it out of the database (with logging)
  • When you inject it into the page (with logging)
  • On the wire (via Wireshark)

When you log, don't just log the strings: log the Unicode characters that make up the strings, as integers. Just cast each character in the string to an integer and log it. It's primitive, but it'll tell you what you need to know.

When you look on the wire, of course, you'll be seeing bytes rather than characters as such. You should work out what bytes you expect for your chosen encoding, and check those against what's actually coming across the network.

You've specified the encoding in the HTML - but have you told whatever's generating your page that you want it in ISO Latin 1? That's likely to be responsible for both setting the content-type header and performing the actual conversion from text to bytes.

Additionally, is there any reason why you're using ISO Latin 1 instead of UTF-8? Why would you deliberately restrict yourself like that? (ISO Latin 1 can only handle the first 256 characters of Unicode, instead of the full range of Unicode characters. UTF-8 can handle everything, and is just as efficient for ASCII.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文