SQL Server (SQLCMD)、Python 和使用非 ascii 字符时的编码问题

发布于 2024-12-13 14:18:09 字数 2873 浏览 4 评论 0原文

当询问 SQL Server 2005 中的数据时,我的 python 代码面临编码问题。

(因为我是 无法编译 PyMSSQL-2.0.0b1)我正在使用这个一段代码和我我能够做一些选择,但现在我坚持这个问题,我不知道 SQLCMD 对我的输出是什么:(

(我必须使用表中包含的欧洲语言,所以我必须面对带有重音的其他编码和很快)

例如:

  • 当我从 Ms SQLServer Management Studio 读取(选择)它时,我有这个国家/地区名称:'Ceská republika'(注意第一个 a 带有锐角符号)
  • 当从命令的 SQLCMD 使用它时行(Windows 7 中的 Powershell),仍然没问题,我可以看到“Cesk'a with eager'”
  • 现在,当使用 Python 和 食谱,即使用此连接字符串:

    sqlcmd -U adminname -P password -S servername -d dbname /w 8192 -u

我得到这个字符串: 'Cesk\xa0 republika'

注意 \xa0 我确实知道它是什么编码,以及我如何能从这个\xa0传递到{a with eager}...

如果我从Python和unicode测试,我应该有这个'\xe1'

>>> unicode('Cesk\xa0 republika')

Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    unicode('Cesk\xa0 republika')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 4: ordinal not in range(128)

>>> unicode_a_with_acute = u'\N{LATIN SMALL LETTER A WITH ACUTE}'
>>> unicode_a_with_acute
u'\xe1'
>>> print unicode_a_with_acute
á
>>> print unicode_a_with_acute.encode('cp1252')
á
>>> unicode_a_with_acute.encode('cp1252')
'\xe1'
>>> print 'Cesk\xa0 republika'.decode('cp1252')
Cesk  republika
>>> print 'Cesk\xa0 republika'.decode('utf8')

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    print 'Cesk\xa0 republika'.decode('utf8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 4: invalid start byte

那么SQLCMD给了我什么?我应该如何强制它和/或 os.popen 和其他人确保我有可以理解的Python utf8?

(注意,我已经尝试过在 SQLCMD 的 os.popen cmd 上使用或不使用 -u 结尾,这应该代表要求 SQLCMD 以 unicode 回答,但没有效果,我也尝试用“select”来提供它“用 utf8 编码的 python 字符串没有再成功:)

 sqlstr = unicode('select * from table_pays where country_code="CZ"')
 cu = c.cursor
 lst = cu.execute(sqlstr)
 rows = cu.fetchall()
 for x in rows:
      print x

 ( 'CZ          ', 'Cesk\xa0 republika       ')

另一

点:从我的谷歌编辑,关于“sqlcmd.exe”,还有这些参数可能会有所帮助:

[ -f < codepage > | i: < codepage > [ < , o: < codepage > ] ]

但我无法指定正确的 参数一,我不知道可能的值是什么,顺便说一句,使用(或不使用):

[ -u unicode output]

dit 也帮不了我......

i'm facing an encoding issue with my python code, when asking data that are in SQL Server 2005.

(because i was unable to compile PyMSSQL-2.0.0b1) i'm using this piece of code and i am able to do some select but now i stick with the issue that i do not know what SQLCMD is output-ting to me :(

(i had to work with European language contained in table, so i had to face other encodings with accent and so on)

for example :

  • when i read it (select) from the Ms SQLServer Management Studio i have this country name : 'Ceská republika' (note the first a is with acute on it)
  • when using it from SQLCMD from command line (Powershell in Windows 7), it is still ok, i can see the "Cesk'a with acute'"
  • now when using Python with the os.popen trick from the recipe, that is with this connection string :

    sqlcmd -U adminname -P password -S servername -d dbname /w 8192 -u

i get this string : 'Cesk\xa0 republika'

notice the \xa0 that i do know what encoding it is, and how i can pass from this \xa0 to {a with acute}...

if i test from Python, and unicode i should have this one '\xe1'

>>> unicode('Cesk\xa0 republika')

Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    unicode('Cesk\xa0 republika')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 4: ordinal not in range(128)

>>> unicode_a_with_acute = u'\N{LATIN SMALL LETTER A WITH ACUTE}'
>>> unicode_a_with_acute
u'\xe1'
>>> print unicode_a_with_acute
á
>>> print unicode_a_with_acute.encode('cp1252')
á
>>> unicode_a_with_acute.encode('cp1252')
'\xe1'
>>> print 'Cesk\xa0 republika'.decode('cp1252')
Cesk  republika
>>> print 'Cesk\xa0 republika'.decode('utf8')

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    print 'Cesk\xa0 republika'.decode('utf8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 4: invalid start byte

so what SQLCMD is giving to me? How should i force it and/or os.popen and others to be sure that i have understandable utf8 for Python?

(notice, i have tried both with and without the -u ending on the os.popen cmd for SQLCMD and that should stand for asking to SQLCMD to answer in unicode, with no effect, also i have tried to feed it with a "select" python string encoded in utf8 with no more success :

 sqlstr = unicode('select * from table_pays where country_code="CZ"')
 cu = c.cursor
 lst = cu.execute(sqlstr)
 rows = cu.fetchall()
 for x in rows:
      print x

 ( 'CZ          ', 'Cesk\xa0 republika       ')

)

another point : from what i googl-ed, about "sqlcmd.exe", there are also these parameters that could may be help :

[ -f < codepage > | i: < codepage > [ < , o: < codepage > ] ]

but i was unable to specify the right one, i do not know what are the possible values, BTW using (or not using) the :

[ -u unicode output]

dit not help me also...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一抹淡然 2024-12-20 14:18:09

看起来您的默认代码页是 850 或 437。切勿尝试猜测代码页:命令提示符中的 chcp 会告诉您系统设置为使用什么。

尝试使用 chcp 或 mode con: 来设置命令处理器代码页不太可能有帮助,因为它们为控制台设置输出代码页,而不是为 pip 或重定向到文件。

要在管道中获取 unicode(或者更确切地说,utf-16)输出,请使用 cmd /u

>>> subprocess.check_output('''cmd /u /c "echo hello\xe1"''').decode('utf16')
'helloá\r\n'
>>> 

但几乎可以肯定,您最好安装一个真正的数据库适配器。

It looks like your default codepage is 850 or 437. Never try to guess at codepages: chcp in a command prompt will tell you what your system is set to use.

Trying to set the command processor codepage with either chcp or mode con: is unlikely to be helpful because they set the output codepage for the console not for pips or redirecting to a file.

To get unicode (or rather, utf-16) output in a pipe use cmd /u:

>>> subprocess.check_output('''cmd /u /c "echo hello\xe1"''').decode('utf16')
'helloá\r\n'
>>> 

But you would almost certainly be better just to install a real database adaptor.

毁虫ゝ 2024-12-20 14:18:09

问题可能是控制台默认在 ascii 模式下工作并且输出被转换
通过当前代码页设置。您可以尝试以下操作,或者写出结果
使用 -o分隔文件-u

然后结果文件将具有正确的 ucs2 编码,python 很乐意采用。其他
是设置 utf8 控制台输出(未经测试):

# setup utf8 on windows console
cmode = 'mode con: codepage select=65001 > NUL & '
cmd = 'my command'
f = os.popen(cmode + cmd)
out = f.readlines()

The problem might be that console works in ascii mode by default and output is converted
via current codepage setting. You can try the following, either write result
to separate file with: -o <file> -u

Then result file will have proper ucs2 encoding, which python gladly takes. Another
is to setup utf8 console output (untested):

# setup utf8 on windows console
cmode = 'mode con: codepage select=65001 > NUL & '
cmd = 'my command'
f = os.popen(cmode + cmd)
out = f.readlines()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文