当前位置：文江博客话题详情

pyodbc如何确定编码？

发布于 2024-11-04 21:04:00 字数 975 浏览 9 评论 0原文

到目前为止，我已经与 Sybase SQL Anywhere 12 和 Python（和 Twisted）一起对抗了几个星期，我什至让我的东西正常工作了。

只剩下一个烦恼了：如果我使用自定义 Python 2.7.1（部署平台）在 CentOS 5 上运行我的脚本，我得到的结果为 UTF-8。

如果我在我的 Ubuntu 机器（Natty Narwhal）上运行它，我会得到 latin1 版本。

不用说，我更愿意以 Unicode 格式获取所有数据，但这不是这个问题的重点。 :)

两者都是 64 位机器，都有自定义的 Python 2.7.1。使用 UCS4 和定制的 unixODBC 2.3.0。

我在这里不知所措。我找不到任何相关文档。是什么使 pyodbc 或 unixODBC 在两个机器上的行为不同？

铁证如山：

Python：2.7.1
DB：SQL Anywhere 12
unixODBC：2.3.0（2.2.14 的行为相同），使用相同标志自行编译
ODBC 驱动程序：源自 Sybase。
CentOS 5 给我 UTF-8，Ubuntu Natty Narwhal 给我 latin1。

我的 odbc.ini 如下所示：

[sybase]
Uid             = user
Pwd             = password
Driver          = /opt/sqlanywhere/lib64/libdbodbc12_r.so
Threading       = True
ServerName      = dbname
CommLinks       = tcpip(host=the-host;DoBroadcast=None)

我仅使用 DNS='sybase' 进行连接。

蒂亚！

原文

I'm fighting Sybase SQL Anywhere 12 together with Python (and Twisted) for several weeks by now and I even got my stuff working.

There's only one annoyance left: If I run my script on CentOS 5 with a custom Python 2.7.1, which is the deployment platform, I get my results as UTF-8.

If I run it on my Ubuntu box (Natty Narwhal) I get them in latin1.

Needless to say, that I would prefer to get all my data in Unicode but that's not the point of this question. :)

Both are 64bit boxes, both have a custom Python 2.7.1. with UCS4 and a custom built unixODBC 2.3.0.

I'm at a loss here. I can't find any documentation on that. What makes pyodbc or unixODBC behave differently on the two boxes?

Hard facts:

Python: 2.7.1
DB: SQL Anywhere 12
unixODBC: 2.3.0 (2.2.14 did behave the same), self compiled with identical flags
ODBC driver: original from Sybase.
CentOS 5 gives me UTF-8, Ubuntu Natty Narwhal gives me latin1.

My odbc.ini looks like this:

[sybase]
Uid             = user
Pwd             = password
Driver          = /opt/sqlanywhere/lib64/libdbodbc12_r.so
Threading       = True
ServerName      = dbname
CommLinks       = tcpip(host=the-host;DoBroadcast=None)

I connect just by using DNS='sybase'.

TIA!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伪心 2024-11-11 21:04:00

我无法告诉你为什么不同，但如果你将“Charset=utf-8”添加到你的 DSN，你应该在两台机器上得到你想要的结果。

免责声明：我在 Sybase 从事 SQL Anywhere 工程工作。

回复收藏 0 原文

青衫负雪 2024-11-11 21:04:00

pyodbc使用ODBC规范，仅支持2种编码。所有以“W”结尾的 ODBC 函数都是使用 SQLWCHAR 的宽字符版本。这是由 ODBC 标头定义的，通常是 UCS2，但有时是 UCS4。非宽版本使用 SQLCHAR 并且始终（？）单字节 ANSI/ASCII。

ODBC 绝对不支持可变宽度编码（例如 UTF8）。如果 ODBC 驱动程序提供了这一点，那么它绝对是错误的。即使数据以 UTF8 存储，驱动程序也必须将其转换为 ANSI 或 UCS2。不幸的是，大多数 ODBC 驱动程序都是完全不正确的。

发送到驱动程序时，如果数据是“str”对象，pyodbc 将使用 ANSI；如果数据是“unicode”对象，则 pyodbc 将使用 UCS2/UCS4（无论 SQLWCHAR 在您的平台上定义是什么）。驱动程序在返回数据时确定数据是 SQLCHAR 还是 SQLWCHAR，而 pyodbc 对此没有任何发言权。如果是 SQLCHAR，则将其转换为“str”对象；如果是 SQLWCHAR，则将其转换为“unicode”对象。

这对于 3.x 版本略有不同，它将同时转换 SQLCHAR 和 SQLCHAR。默认情况下，SQLWCHAR 为 Unicode。

回复收藏 0 原文

~没有更多了~