Java 无法通过 JDBC-ODBC 从 Access 检索 Unicode(立陶宛语)字母

发布于 2024-12-01 16:45:42 字数 934 浏览 0 评论 0原文

我有数据库,其中一些名称是用立陶宛字母编写的,但是当我尝试使用java获取它们时,它会忽略

    DbConnection();
    zadanie=connect.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_UPDATABLE);
    sql="SELECT * FROM Clients;";   
    dane=zadanie.executeQuery(sql);

    String kas="Imonė";
    while(dane.next())
    {
         String var=dane.getString("Pavadinimas");       
         if (var!= null) {var =var.trim();} 
         String rus =dane.getString("Rusys");   
         System.out.println(kas+" "+rus);
    }

    void DbConnection() throws SQLException
    {
        String baza="jdbc:odbc:DatabaseDC"; 
        try
        {
            Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
        }catch(Exception e){System.out.println("Connection error");}
        connect=DriverManager.getConnection(baza);
    }

数据库中的立陶宛字母字段类型为TEXT,大小20,不要使用任何额外的字母解码或类似的东西。

它给了我“Imonė Imone”,尽管在 DB 中写的是“Imonė”,等于 rus。

i have DB where some names are written with Lithuanian letters, but when I try to get them using java it ignores Lithuanian letters

    DbConnection();
    zadanie=connect.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_UPDATABLE);
    sql="SELECT * FROM Clients;";   
    dane=zadanie.executeQuery(sql);

    String kas="Imonė";
    while(dane.next())
    {
         String var=dane.getString("Pavadinimas");       
         if (var!= null) {var =var.trim();} 
         String rus =dane.getString("Rusys");   
         System.out.println(kas+" "+rus);
    }

    void DbConnection() throws SQLException
    {
        String baza="jdbc:odbc:DatabaseDC"; 
        try
        {
            Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
        }catch(Exception e){System.out.println("Connection error");}
        connect=DriverManager.getConnection(baza);
    }

in DB type of field is TEXT, size 20, don't use any additional letter decoding or something like this.

it gives me " Imonė Imone " despite that in DB is written "Imonė" which equals rus.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你怎么敢 2024-12-08 16:45:42

现在 JDBC-ODBC 桥已从 Java 8 中删除,这个特定问题将越来越成为一个历史兴趣项目,但需要记录的是:

JDBC-ODBC 桥从未与 Access ODBC 驱动程序(“Jet”和“ACE”)用于代码点 U+00FF 以上的 Unicode 字符。这是因为 Access 将此类字符存储为 Unicode,但它使用 UTF-8 编码。相反,它使用 UTF-16LE 的“压缩”变体,其中代码点 U+00FF 及以下的字符存储为单个字节,而 U+00FF 以上的字符存储为空字节,后跟其 UTF-16LE 字节对(s)。

如果字符串“Imonė”存储在 Access 数据库中,以便它在 Access 本身中正确显示

accessEncoded.png

则它是存储为

I  m  o  n  ė
-- -- -- -- --------
49 6D 6F 6E 00 17 01

(“ė”为 U+0117)。

JDBC-ODBC 桥不理解它从 Access ODBC 驱动程序接收的最后一个字符的内容,因此它只是返回

Imon?

另一方面,如果我们尝试使用 UTF-8 编码将字符串存储在 Access 数据库中,就会发生这种情况如果 JDBC-ODBC 桥尝试插入字符串本身,

Statement s = con.createStatement();
s.executeUpdate("UPDATE vocabulary SET word='Imonė' WHERE ID=5");

则该字符串将被编码为 UTF-8

I  m  o  n  ė
-- -- -- -- -----
49 6D 6F 6E C4 97

,然后 Access ODBC 驱动程序会将其存储在数据库中,因为

I  m  o  n  Ä  —
-- -- -- -- -- ---------
49 6D 6F 6E C4 00 14 20
  • Windows-1252 中的 C4 为“Ä”,即 U+00C4,因此它存储为只是 C4
  • 97 在 Windows-1252 中是“em dash”,即 U+2014,因此它存储为 00 14 20

现在 JDBC-ODBC 桥可以正常检索它(因为 Access ODBC 驱动程序在退出时将字符“un-mangles”回 C4 97),但是如果我们在 Access 中打开数据库,我们会看到

ImonÄ—

utf8Encoded.png

JDBC-ODBC 桥从未永远能够为 Access 数据库提供完整的本机 Unicode 支持。向 JDBC 连接添加各种属性不会解决问题。

要在不使用 ODBC 的情况下获得 Access 数据库的完整 Unicode 字符支持,请考虑使用 UCanAccess。 (更多详细信息请参阅此处的另一个问题。)

Now that the JDBC-ODBC Bridge has been removed from Java 8 this particular question will increasingly become just an item of historical interest, but for the record:

The JDBC-ODBC Bridge has never worked correctly with the Access ODBC Drivers ("Jet" and "ACE") for Unicode characters above code point U+00FF. That is because Access stores such characters as Unicode but it does not use UTF-8 encoding. Instead, it uses a "compressed" variation of UTF-16LE where characters with code points U+00FF and below are stored as a single byte, while characters above U+00FF are stored as a null byte followed by their UTF-16LE byte pair(s).

If the string 'Imonė' is stored within the Access database so that it appears properly in Access itself

accessEncoded.png

then it is stored as

I  m  o  n  ė
-- -- -- -- --------
49 6D 6F 6E 00 17 01

('ė' is U+0117).

The JDBC-ODBC Bridge does not understand what it receives from the Access ODBC driver for that final character, so it just returns

Imon?

On the other hand, if we try to store the string in the Access database with UTF-8 encoding, as would happen if the JDBC-ODBC Bridge attempted to insert the string itself

Statement s = con.createStatement();
s.executeUpdate("UPDATE vocabulary SET word='Imonė' WHERE ID=5");

the string would be UTF-8 encoded as

I  m  o  n  ė
-- -- -- -- -----
49 6D 6F 6E C4 97

and then the Access ODBC Driver will store it in the database as

I  m  o  n  Ä  —
-- -- -- -- -- ---------
49 6D 6F 6E C4 00 14 20
  • C4 is 'Ä' in Windows-1252 which is U+00C4 so it is stored as just C4
  • 97 is "em dash" in Windows-1252 which is U+2014 so it is stored as 00 14 20

Now the JDBC-ODBC Bridge can retrieve it okay (since the Access ODBC Driver "un-mangles" the character back to C4 97 on the way out), but if we open the database in Access we see

ImonÄ—

utf8Encoded.png

The JDBC-ODBC Bridge has never and will never be able to provide full native Unicode support for Access databases. Adding various properties to the JDBC connection will not solve the problem.

For full Unicode character support of Access databases without ODBC, consider using UCanAccess instead. (More details available in another question here.)

夜司空 2024-12-08 16:45:42

当您使用 JDBC-ODBC 桥时,您可以在连接详细信息中指定字符集

试试这个:

Properties prop = new java.util.Properties();
prop.put("charSet", "UTF-8");

String baza="jdbc:odbc:DatabaseDC"; 
connect=DriverManager.getConnection(baza, prop);

As you're using the JDBC-ODBC bridge, you can specify a charset in the connection details.

Try this:

Properties prop = new java.util.Properties();
prop.put("charSet", "UTF-8");

String baza="jdbc:odbc:DatabaseDC"; 
connect=DriverManager.getConnection(baza, prop);
陌若浮生 2024-12-08 16:45:42

尝试使用“Windows-1257”而不是 UTF-8,这是针对波罗的海地区的。

java.util.Properties prop = new java.util.Properties();
prop.put("charSet", "Windows-1257");

Try to use this "Windows-1257" instead of UTF-8, this is for Baltic region.

java.util.Properties prop = new java.util.Properties();
prop.put("charSet", "Windows-1257");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文