使用 JDBC for MySQL 从 latin1 字段查询中文值时无法获取正确的值

发布于 2024-12-22 21:00:13 字数 3078 浏览 5 评论 0原文

使用 JDBC for MySQL 从 latin1 字段（varchar 或 char）查询中文值时无法获得正确的值。角色无法改变。以下是测试步骤。对于这种情况是否有可能获得正确的值？我还应该做什么？

1.创建数据库：

CREATE DATABASE TESTDB CHARACTER SET latin1 COLLATE latin1_general_ci;

2.创建表：

CREATE TABLE TB1 (
    vname varchar(50) default '',
    cname char(50) default ''
) DEFAULT CHARSET=latin1 ;

3.插入数据：

set names latin1;  
insert into TB1(vname,cname) values('名字v','名字c');

4.显示字符集：

mysql> show variables like '%set%';  
+--------------------------+---------------------------------------------------------+  
| Variable_name            | Value                                                   |  
+--------------------------+---------------------------------------------------------+  
| auto_increment_offset    | 1                                                       |  
| character_set_client     | latin1                                                  |  
| character_set_connection | latin1                                                  |  
| character_set_database   | latin1                                                  |  
| character_set_filesystem | binary                                                  |  
| character_set_results    | latin1                                                  |  
| character_set_server     | utf8                                                    |  
| character_set_system     | utf8                                                    |  
| character_sets_dir       | F:\Program Files\MySQL\MySQL Server 5.5\share\charsets\ |  
+--------------------------+---------------------------------------------------------+

5.从命令控制台查询时可以得到正确的值：

mysql> select * from tb1;  
+-------+-------+  
| vname | cname |  
+-------+-------+  
| 名字v     | 名字c     |  
+-------+-------+  
1 row in set (0.00 sec)

6.使用JDBC查询时无法得到正确的值：
jdbc url : jdbc:mysql://192.168.5.74/testdb?characterEncoding=UTF-8

*set names utf8;  
select vname,hex(vname),length(vname),char_length(vname) from tb1;  
select cname,hex(cname),length(cname),char_length(cname) from tb1;  
select vname,cname  
    ,CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING utf8) as c1  
    ,CONVERT(CONVERT(CONVERT(cname USING latin1) USING binary) USING utf8) as c2  
from tb1;*  

vname           hex(vname)     length(vname) char_length(vname)   
--------------- -------------- ------------- ------------------   
??×?v           C3FBD7D676     5             5                    

cname           hex(cname)     length(cname) char_length(cname)   
--------------- -------------- ------------- ------------------   
??×?c           C3FBD7D663     5             5                    

vname             cname     c1      c2      
----------------- --------- ------- ------  
??×?v             ??×?c

原文

It can't get correct value to query Chinese value from latin1 field (varchar or char) Using JDBC for MySQL. The character can't be changed. The following is the test steps. Is it possible to get the correct value for this situation? What's more should I do?

1. create database:

CREATE DATABASE TESTDB CHARACTER SET latin1 COLLATE latin1_general_ci;

2. create table:

CREATE TABLE TB1 (
    vname varchar(50) default '',
    cname char(50) default ''
) DEFAULT CHARSET=latin1 ;

3. insert data:

set names latin1;  
insert into TB1(vname,cname) values('名字v','名字c');

4. show character set:

mysql> show variables like '%set%';  
+--------------------------+---------------------------------------------------------+  
| Variable_name            | Value                                                   |  
+--------------------------+---------------------------------------------------------+  
| auto_increment_offset    | 1                                                       |  
| character_set_client     | latin1                                                  |  
| character_set_connection | latin1                                                  |  
| character_set_database   | latin1                                                  |  
| character_set_filesystem | binary                                                  |  
| character_set_results    | latin1                                                  |  
| character_set_server     | utf8                                                    |  
| character_set_system     | utf8                                                    |  
| character_sets_dir       | F:\Program Files\MySQL\MySQL Server 5.5\share\charsets\ |  
+--------------------------+---------------------------------------------------------+

5. it can get correct value when querying from command console:

mysql> select * from tb1;  
+-------+-------+  
| vname | cname |  
+-------+-------+  
| 名字v     | 名字c     |  
+-------+-------+  
1 row in set (0.00 sec)

6. can't get the correct valuing when using JDBC to query:
jdbc url : jdbc:mysql://192.168.5.74/testdb?characterEncoding=UTF-8

*set names utf8;  
select vname,hex(vname),length(vname),char_length(vname) from tb1;  
select cname,hex(cname),length(cname),char_length(cname) from tb1;  
select vname,cname  
    ,CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING utf8) as c1  
    ,CONVERT(CONVERT(CONVERT(cname USING latin1) USING binary) USING utf8) as c2  
from tb1;*  

vname           hex(vname)     length(vname) char_length(vname)   
--------------- -------------- ------------- ------------------   
??×?v           C3FBD7D676     5             5                    

cname           hex(cname)     length(cname) char_length(cname)   
--------------- -------------- ------------- ------------------   
??×?c           C3FBD7D663     5             5                    

vname             cname     c1      c2      
----------------- --------- ------- ------  
??×?v             ??×?c

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终难遇 2024-12-29 21:00:13

我解决了。 多字节字符存储到latin1字段时会被转换为字节。它需要转换回插入字符集。以下sql可以做到这一点：

CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING [INSERT_CHARSET])

我的插入字符集是gb2312，所以sql应该是：

CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING gb2312)

I resolve it. The multi-bytes characters are converted to bytes when storing in latin1 field. It need to convert back to the inserting charset. The following sql can do this:

CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING [INSERT_CHARSET])

My inserting charset is gb2312, so the sql should be :

CONVERT(CONVERT(CONVERT(vname USING latin1) USING binary) USING gb2312)

回复收藏 0 原文

把昨日还给我 2024-12-29 21:00:13

你应该改变 latin1_general_ci;为 utf8_general_ci 或 utf16_general_ci。问题是CHARSET=latin1不能像汉字一样存储unicode字符。

看看这个：
http://dev.mysql.com/doc/refman /5.0/en/charset-unicode-utf8.html

UTF-8的思想是使用各种Unicode字符进行编码
不同长度的字节序列：
基本拉丁字母、数字和标点符号使用一个字节。

大多数欧洲和中东脚本字母都适合两字节序列： 
扩展拉丁字母（带有波形符、长音符号、尖音符、重音符和
其他口音）、西里尔语、希腊语、亚美尼亚语、希伯来语、阿拉伯语、叙利亚语和
其他。
**韩语、中文和日语表意文字使用三字节序列。**

Latin1字符集只有8位单字节，而中文字符集需要多字节

You should change latin1_general_ci; to utf8_general_ci or utf16_general_ci. The problem is CHARSET=latin1 cannot store unicode characters like Chinese characters.

Check out this:
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-utf8.html

The idea of UTF-8 is that various Unicode characters are encoded using
byte sequences of different lengths:
Basic Latin letters, digits, and punctuation signs use one byte.

Most European and Middle East script letters fit into a two-byte sequence: 
extended Latin letters (with tilde, macron, acute, grave and
other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac, and
others.
**Korean, Chinese, and Japanese ideographs use three-byte sequences.**

Latin1 charset is only 8bit single byte while Chinese script require multi-bytes

回复收藏 0 原文

帅的被狗咬 2024-12-29 21:00:13

获取所需字符的另一种更简单的方法是从列中获取字节，然后将它们转换为应用程序内的字符串。

伪代码...

byte[] rawBytes = resultSet.getBytes( "vname" );
String vname = new String( rawBytes, "gb2312" );

检查 mysql 是否意外或设计地在监视器中显示正确字符串的一个非常有用的方法是使用长度函数。

select vname, length( vname ) as bytelength, char_length( vname ) as stringlength from tb1;

此外，hex() 函数也是您的朋友：

select vname, hex( vname ) from tb1;

它将向您显示存储在列中的字节的十六进制值，以便您可以查找这些值（是的，维基百科！）以查看它们是否与以各种不同方式查看的显示值相匹配。

Another, simpler way of getting the desired characters would be to get the bytes from the column and then convert those into a string inside the application.

Pseudocode...

byte[] rawBytes = resultSet.getBytes( "vname" );
String vname = new String( rawBytes, "gb2312" );

A very useful way to check and see if mysql is showing you the correct string in the monitor by accident or design is to use the length functions.

select vname, length( vname ) as bytelength, char_length( vname ) as stringlength from tb1;

In addition, the hex() function is also your friend:

select vname, hex( vname ) from tb1;

That'll show you the hex values of the bytes that are stored in the column so you can then look those up (yay, Wikipedia!) to see if they match the displayed value in the various different ways of viewing it.

回复收藏 0 原文

~没有更多了~