sqlalchemy UnicodeDecodeError: 'utf8'尝试从 msssql 2005 表中选择全部时，编解码器无法解码字节 0xe7

发布于 2024-12-22 19:54:47 字数 3436 浏览 3 评论 0原文

我正在使用 mssql 2005 构建一个与另一个应用程序的数据库配合使用的应用程序（我无法更改它或更改现有的表定义）。 w mssql表排序规则是：“hebrew bin”，应用程序从表中显示完美的希伯来语，所有py文件都编码为utf-8

注意！使用 mssql 将 unicode 希伯来语字符串写入数据库没有问题。选择和删除没有问题：DBSession2.query(object).filter(object.LOADED=='Y').delete() 但是当从表中进行选择时，我遇到了这个非常烦人的错误：

  File "D:\Python27\learn\agent\agent\lib\encodings\utf_8.py", line 16, in decode     return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte

确切的字节代码根据表中第一行的第一个字节而变化。

是的，我知道这会翻译成希伯来字母 - 这应该不是问题，因为各方都使用 unicode - 至少我是这么认为的。

顺便说一句 - 这在测试 mssql 2005 服务器上运行良好，但不适用于生产服务器。

一些代码：这是函数的失败部分：

def iterateJson(parser,injson,object):
    '''iterateJson(parser,injson,object):getting a parser method an a json and iterating over the json
    with the parser method, checkes for existing objects in the db table and deletes them before commiting the new one to
    prevent integerityerrors
    writes ValidateJsonError to errorlog for each element in the json
    getting an onject name to check for loaded etc'''

    #first lets erase the table from loaded objects
    DBSession2.query(object).filter(object.LOADED=='Y').delete()
    print "finished deleting loaded"
    #now lets get a list from the table of loaded id
    raw_list = DBSession2.query(object).all() #the failing part!
    print "getting raw list of unloaded" #doesn't get here!
    if object == Activities:
        id_list = [e.EVENTCODE for e in raw_list]
        id = e.EVENTCODE

这是 sqlalchemy 类的一部分：

class Deposit(DeclarativeBase2):
    __tablename__ = 'NOAR_LOADDEPOSIT'
    #LINE = Column(INT(8)) 
    RECDEBNUM = Column(NVARCHAR(9) , primary_key=True)
    CURDATE = Column(BIGINT, nullable=False, default=text(u'((0))')) 
    PAYTYPE = Column(CHAR(1), nullable=False, default=text(u"('')")) 
    BANKCODE = Column(NVARCHAR(8), nullable=False, default=text(u"('')")) 
    CUSTACCNAME = Column(NVARCHAR(16), nullable=False, default=text(u"('')")) 
    PAGENUM = Column(NVARCHAR(5), nullable=False, default=text(u"('')"))
    RECNUM = Column(NVARCHAR(2), nullable=False, default=text(u"('')")) 
    RECDATE = Column(BIGINT, nullable=False, default=text(u'((0))')) 
    FIXNUM = Column(NCHAR(1), nullable=False, default=text(u"('')")) 
    EVENTNUM = Column(NVARCHAR(5), nullable=False, default=text(u"('')")) 
    GROUPCODE = Column(NVARCHAR(7), nullable=False, default=text(u"('')")) 
    IDNUMBER = Column(NVARCHAR(9), nullable=False, default=text(u"('')"))

和另一个类（都给出相同的问题）

class Activities(DeclarativeBase2):  


    __tablename__ = 'NOAR_LOADEVENTS'

    EVENTCODE = Column(NVARCHAR(8), primary_key=True)
    EVENTDES = Column(Unicode, nullable=False, default=text(u"('')"))
    TYPE = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LC = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LD = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LE = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LF = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LG = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LH = Column(NCHAR(1), nullable=False, default=text(u"('')"))

使用：python 2.7（64位win）与 pyodbc 2.1.11 vs mssql server 2005，sqlalchemy 0.7.3 tg2.1.3

很高兴获得任何帮助或参考

原文

I'm building an application that works with the db of another application, using mssql 2005 (I can't change that or change the existing table definition). w
the mssql table collate is: "hebrew bin", and the application shows perfect hebrew from the table, all the py files are encoded utf-8

notice! there is no problem writing with unicode hebrew strings to the db with mssql.
there is no trouble choosing and deleting: DBSession2.query(object).filter(object.LOADED=='Y').delete()
but when selecting from the table I get this very annoying error:

  File "D:\Python27\learn\agent\agent\lib\encodings\utf_8.py", line 16, in decode     return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte

where the exact byte code changes according to the first byte of the first row in the table.

yes, I know that this translates to an hebrew letter - this shouldn't be a problem since all sides play with unicode - at least that is what I thought.

btw - this worked fine on the test mssql 2005 server but doesn't work with the production server.

some code:
this is the failing part of the function:

def iterateJson(parser,injson,object):
    '''iterateJson(parser,injson,object):getting a parser method an a json and iterating over the json
    with the parser method, checkes for existing objects in the db table and deletes them before commiting the new one to
    prevent integerityerrors
    writes ValidateJsonError to errorlog for each element in the json
    getting an onject name to check for loaded etc'''

    #first lets erase the table from loaded objects
    DBSession2.query(object).filter(object.LOADED=='Y').delete()
    print "finished deleting loaded"
    #now lets get a list from the table of loaded id
    raw_list = DBSession2.query(object).all() #the failing part!
    print "getting raw list of unloaded" #doesn't get here!
    if object == Activities:
        id_list = [e.EVENTCODE for e in raw_list]
        id = e.EVENTCODE

this is part of the sqlalchemy class:

class Deposit(DeclarativeBase2):
    __tablename__ = 'NOAR_LOADDEPOSIT'
    #LINE = Column(INT(8)) 
    RECDEBNUM = Column(NVARCHAR(9) , primary_key=True)
    CURDATE = Column(BIGINT, nullable=False, default=text(u'((0))')) 
    PAYTYPE = Column(CHAR(1), nullable=False, default=text(u"('')")) 
    BANKCODE = Column(NVARCHAR(8), nullable=False, default=text(u"('')")) 
    CUSTACCNAME = Column(NVARCHAR(16), nullable=False, default=text(u"('')")) 
    PAGENUM = Column(NVARCHAR(5), nullable=False, default=text(u"('')"))
    RECNUM = Column(NVARCHAR(2), nullable=False, default=text(u"('')")) 
    RECDATE = Column(BIGINT, nullable=False, default=text(u'((0))')) 
    FIXNUM = Column(NCHAR(1), nullable=False, default=text(u"('')")) 
    EVENTNUM = Column(NVARCHAR(5), nullable=False, default=text(u"('')")) 
    GROUPCODE = Column(NVARCHAR(7), nullable=False, default=text(u"('')")) 
    IDNUMBER = Column(NVARCHAR(9), nullable=False, default=text(u"('')"))

and the other class (both give the same problem)

class Activities(DeclarativeBase2):  


    __tablename__ = 'NOAR_LOADEVENTS'

    EVENTCODE = Column(NVARCHAR(8), primary_key=True)
    EVENTDES = Column(Unicode, nullable=False, default=text(u"('')"))
    TYPE = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LC = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LD = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LE = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LF = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LG = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LH = Column(NCHAR(1), nullable=False, default=text(u"('')"))

using: python 2.7 (64bit win) with pyodbc 2.1.11 vs mssql server 2005, sqlalchemy 0.7.3 tg2.1.3

be glad for any help or reference

分享到QQ

分享到微博