PostgreSQL 的 rodbc 字符编码错误

发布于 2024-12-01 05:04:13 字数 1007 浏览 9 评论 0原文

使用 RODBC 从 R 连接到 GreenPlum PostgreSQL 数据库时，我遇到了以前从未遇到过的新错误。我在使用 EMACS/ESS 和 RStudio 时都遇到了错误，并且 RODBC 调用的工作方式与过去一样。

library(RODBC)
gp <- odbcConnect("greenplum", believeNRows = FALSE)
data <- sqlQuery(gp, "select * from mytable")

> data
[1] "22P05 7 ERROR: character 0xc280 of encoding \"UTF8\" has no equivalent in  "WIN1252\";\nError while executing the query" 
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select * from mytable'"

编辑：刚刚尝试查询另一个表并得到了结果。所以我猜这不是 RODBC 问题而是 PostgreSQL 表编码问题。

R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RODBC_1.3-2
>

原文

I'm getting a new error which I've never gotten before when connecting from R to a GreenPlum PostgreSQL database using RODBC. I've gotten the error using both EMACS/ESS and RStudio, and the RODBC call has worked as is in the past.

library(RODBC)
gp <- odbcConnect("greenplum", believeNRows = FALSE)
data <- sqlQuery(gp, "select * from mytable")

> data
[1] "22P05 7 ERROR: character 0xc280 of encoding \"UTF8\" has no equivalent in  "WIN1252\";\nError while executing the query" 
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select * from mytable'"

EDIT:
Just tried querying another table and did get results. So I guess it's not an RODBC problem but a PostgreSQL table encoding problem.

R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RODBC_1.3-2
>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

九局 2024-12-08 05:04:13

首先，出现此问题是因为 R 正在尝试转换为支持 UTF8 的 Windows 语言环境。不幸的是，Brian Ripley 多次报告 Windows 没有 UTF8 语言环境。通过花费数小时在网络、StackOverflow、Microsoft 等上搜索，我得出的结论是 Microsoft ~~讨厌 UTF-8~~ Windows 不会支持 UTF8。

因此，如果有任何解决方案的话，我不确定是否有一个简单的解决方案。我能推荐的最好的方法是在服务器端包装某种转换，如果可以的话考虑过滤数据，或者如果合适的话尝试不同的语言（例如中文，日语，韩语）。

如果您决定封装转换器，unicode.org 建议此 ICU 工具包。

回复收藏 0 原文

独行侠 2024-12-08 05:04:13

0xc280 是一个控制元素（Unicode 中的 U+0080），在使用 SQL 等时经常引起问题。问题通常在于转换链，当您使用使用不同编码方案的不同应用程序时，总会发生这种情况。 Windows 现已包含 UTF-8，因此严格来说这不是 Windows 问题。我相信问题在R读取数据之前就出现了。

事实上，在链中，UNICODE中的字符序列0x80将被映射到UTF-8中的0xc280。这应该是一个控制序列，无法打印。但很有可能 0x80 实际上不是 UNICODE，而是 Windows Latin-1 或 Latin-2。在这种情况下，0x80 代表欧元符号。这可能可以解释它如何最终出现在您的数据中。检查一下你是否能在数据中找到类似的东西，这已经可以解释一些事情了。

我的猜测是，解决方案不会位于该工作链的 R 端，而是在此之前。它将尝试自动转换，但据报告在某些情况下会失败（顺便说一句，也适用于 SQL 和 Oracle）。检查您在 Postgresql 中使用的编码，并尝试使用任何拉丁类型。可能还涉及其他链接（例如 Putty 或类似终端）。我很确定所有编码都是 ISO8859-1，即 Latin-1。 UTF-8 被夹在中间，当 0x80 字符被错误地映射到 0xc280 时，就会遇到麻烦。

因此，请检查完整工作链中的编码，并确保它们全部匹配。如果不这样做，每个步骤之间完成的自动转换势必会给某些字符带来麻烦。

希望这有帮助。

回复收藏 0 原文

木落 2024-12-08 05:04:13

我可能已经在其他地方发布了此回复，但这里是。

从 MS SQL 管理客户端连接到 Postgres DB 时，我遇到类似的错误。在我的例子中，试图修复源数据几乎是不可能的。

我的场景：

尝试通过 MS SQL 链接对象连接到 Postgress
ODBC 系统 DSN，并看到诸如“ERROR: character 0xc280 of
编码“UTF8”在“WIN1252”中没有等效项；
某些表上的 Select 语句有效，而其他表则抛出此错误。

修复：使用支持 Unicode 的 ODBC 驱动程序。我正在使用 PostgreSQL Global Development Group 的 ODBC 驱动程序。转至配置 DSN/管理 DSN 并选择 Unicode 驱动程序。

祝你好运。

回复收藏 0 原文

转角预定愛 2024-12-08 05:04:13

默认情况下，Greenplum 使用 UTF8 进行字符编码。您可以通过登录 Greenplum 服务器并启动 psql - Greenplum 控制台客户端来检查这一点。
在此控制台应用程序中，您可以发出命令：\l 列出 Greenplum 中配置的所有数据库 - 这也应该描述数据库的字符集。

我认为你的问题是 R 不支持 UTF8 字符（你使用不同的区域设置）
但是您可以在 ODBC 驱动程序中使用即时转码。不确定所有 ODBC 驱动程序，但 DataDirect 驱动程序支持 odbc.ini 文件（通常位于用户主目录中）中的额外选项 - IANAAppCodePage。

您可以在此链接上找到此参数的适当代码：
http://www.iana.org/assignments/character-sets

这是示例od ODBC.ini 内容：

[ODBC]
Driver=/opt/odbc/lib/S0gplm60.so
IANAAppCodePage=2252
AlternateServers=
ApplicationUsingThreads=1
ConnectionReset=0
ConnectionRetryCount=0
ConnectionRetryDelay=3
Database=mysdb
EnableDescribeParam=1
ExtendedColumnMetadata=0
FailoverGranularity=0
FailoverMode=0
FailoverPreconnect=0
FetchRefCursor=1
FetchTSWTZasTimestamp=0
FetchTWFSasTime=0
HostName=192.168.1.100
InitializationString=
LoadBalanceTimeout=0
LoadBalancing=0
LoginTimeout=15
LogonID=
MaxPoolSize=100
MinPoolSize=0
Password=
Pooling=0
PortNumber=5432  
QueryTimeout=0
ReportCodepageConversionErrors=0
TransactionErrorBehavior=1
XMLDescribeType=-10

By default Greenplum use UTF8 for character encoding. You could check this by logging in to Greenplum server and launching psql - console client for Greenplum.
In this console application you could issue command: \l to list all of the databases configured in the Greenplum - this should also describe character set for database.

I think your prblem is that R doesnt support UTF8 for chars (You use different locale)
But you could use On-the-fly transcoding in ODBC driver. Not sure about all ODBC drivers but DataDirect drivers support extra option in odbc.ini file (usually located in user home directory) - IANAAppCodePage.

You could find appropriate code for this parameter on this link:
http://www.iana.org/assignments/character-sets

Here is the example od ODBC.ini content:

[ODBC]
Driver=/opt/odbc/lib/S0gplm60.so
IANAAppCodePage=2252
AlternateServers=
ApplicationUsingThreads=1
ConnectionReset=0
ConnectionRetryCount=0
ConnectionRetryDelay=3
Database=mysdb
EnableDescribeParam=1
ExtendedColumnMetadata=0
FailoverGranularity=0
FailoverMode=0
FailoverPreconnect=0
FetchRefCursor=1
FetchTSWTZasTimestamp=0
FetchTWFSasTime=0
HostName=192.168.1.100
InitializationString=
LoadBalanceTimeout=0
LoadBalancing=0
LoginTimeout=15
LogonID=
MaxPoolSize=100
MinPoolSize=0
Password=
Pooling=0
PortNumber=5432  
QueryTimeout=0
ReportCodepageConversionErrors=0
TransactionErrorBehavior=1
XMLDescribeType=-10

回复收藏 0 原文

~没有更多了~