Python 非 ASCII 字符

发布于 2024-12-07 05:44:37 字数 6207 浏览 0 评论 0原文

我有一个 python 文件,它在 ms sql 中创建并填充一个表。唯一的症结在于,如果存在任何非 ascii 字符或单个撇号(而且每种撇号都有相当多),代码就会中断。虽然我可以运行替换函数来删除撇号字符串,但我更愿意保持它们完整。我也尝试过将数据转换为 utf-8,但也没有成功。

以下是我收到的错误消息:

"'ascii' codec can't encode character u'\2013' in position..." (for non-ascii characters)

对于单引号

class 'pyodbc.ProgrammingError'>: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server] Incorrect syntax near 'S, 230 X 90M.; Eligibilty....  

,当我尝试用 utf-8 编码字符串时,我收到以下错误消息:

<type 'exceptions.UnicodeDecodeError'>: ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128)

下面包含 python 代码。我相信代码中发生此中断的点位于以下行之后:InsertValue = str(row.GetValue(CurrentField['Name']))。

# -*- coding: utf-8 -*-

import pyodbc
import sys
import arcpy
import arcgisscripting

gp = arcgisscripting.create(9.3)
SQL_KEYWORDS = ['PERCENT', 'SELECT', 'INSERT', 'DROP', 'TABLE']

#SourceFGDB = '###'
#SourceTable = '###'
SourceTable = sys.argv[1]
TempInputName = sys.argv[2]
SourceTable2 = sys.argv[3]
#---------------------------------------------------------------------------------------------------------------------
# Target Database Settings
#---------------------------------------------------------------------------------------------------------------------
TargetDatabaseDriver = "{SQL Server}"
TargetDatabaseServer = "###"
TargetDatabaseName = "###"
TargetDatabaseUser = "###"
TargetDatabasePassword = "###"

# Get schema from FGDB table.
# This should be an ordered list of dictionary elements [{'FGDB_Name', 'FGDB_Alias', 'FGDB_Type', FGDB_Width, FGDB_Precision, FGDB_Scale}, {}]

if not gp.Exists(SourceTable):
    print ('- The source does not exist.')
    sys.exit(102)
#### Should see if it is actually a table type.  Could be a Feature Data Set or something...
print('        - Processing Items From : ' + SourceTable)
FieldList = []
Field_List = gp.ListFields(SourceTable)
print('            - Getting number of rows.')
result = gp.GetCount_management(SourceTable)
Number_of_Features = gp.GetCount_management(SourceTable)
print('                - Number of Rows: ' + str(Number_of_Features))
print('            - Getting fields.')
Field_List1 = gp.ListFields(SourceTable, 'Layer')
Field_List2 = gp.ListFields(SourceTable, 'Comments')
Field_List3 = gp.ListFields(SourceTable, 'Category')
Field_List4 = gp.ListFields(SourceTable, 'State')
Field_List5 = gp.ListFields(SourceTable, 'Label')
Field_List6 = gp.ListFields(SourceTable, 'DateUpdate')
Field_List7 = gp.ListFields(SourceTable, 'OBJECTID')
for Current_Field in Field_List1 + Field_List2 + Field_List3 + Field_List4 + Field_List5 + Field_List6 + Field_List7:
        print('            - Field Found: ' + Current_Field.Name)
        if Current_Field.AliasName in SQL_KEYWORDS:
            Target_Name = Current_Field.Name + '_'
        else:
            Target_Name = Current_Field.Name

        print('                 - Alias    : ' + Current_Field.AliasName)
        print('                 - Type     : ' + Current_Field.Type)
        print('                 - Length   : ' + str(Current_Field.Length))
        print('                 - Scale    : ' + str(Current_Field.Scale))
        print('                 - Precision: ' + str(Current_Field.Precision))
        FieldList.append({'Name': Current_Field.Name, 'AliasName': Current_Field.AliasName, 'Type': Current_Field.Type, 'Length': Current_Field.Length, 'Scale': Current_Field.Scale, 'Precision': Current_Field.Precision, 'Unique': 'UNIQUE', 'Target_Name': Target_Name})
# Create table in SQL Server based on FGDB table schema.
cnxn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=###;DATABASE=###;UID=sql_webenvas;PWD=###')
cursor = cnxn .cursor()
#### DROP the table first?
try:
    DropTableSQL = 'DROP TABLE dbo.' + TempInputName + '_Test;'
    print DropTableSQL
    cursor.execute(DropTableSQL)
    dbconnection.commit()
except:
    print('WARNING: Can not drop table - may not exist: ' + TempInputName + '_Test')
CreateTableSQL = ('CREATE TABLE  ' + TempInputName + '_Test '
' (Layer varchar(500), Comments varchar(5000), State int, Label varchar(500), DateUpdate DATETIME, Category varchar(50), OBJECTID int)')
cursor.execute(CreateTableSQL)
cnxn.commit()
# Cursor through each row in the FGDB table, get values, and insert into the SQL Server Table.
# We got Number_of_Features earlier, just use that.
Number_Processed = 0
print('        - Processing ' + str(Number_of_Features) + ' features.')
rows = gp.SearchCursor(SourceTable)
row = rows.Next()
while row:
    if Number_Processed % 10000 == 0:
        print('            - Processed ' + str(Number_Processed) + ' of ' + str(Number_of_Features))
    InsertSQLFields = 'INSERT INTO ' + TempInputName + '_Test ('
    InsertSQLValues = 'VALUES ('
    for CurrentField in FieldList:
        InsertSQLFields = InsertSQLFields + CurrentField['Target_Name'] + ', '
        InsertValue = str(row.GetValue(CurrentField['Name']))
        if InsertValue in ['None']:
            InsertValue = 'NULL'
        # Use an escape quote for the SQL.
        InsertValue = InsertValue.replace("'","' '")
        if CurrentField['Type'].upper() in ['STRING', 'CHAR', 'TEXT']:
            if InsertValue == 'NULL':
                InsertSQLValues = InsertSQLValues + "NULL, "
            else:
                InsertSQLValues = InsertSQLValues + "'" + InsertValue + "', "
        elif CurrentField['Type'].upper() in ['GEOMETRY']:
            ## We're not handling geometry transfers at this time.
            if InsertValue == 'NULL':
                InsertSQLValues = InsertSQLValues + '0' + ', '
            else:
                InsertSQLValues = InsertSQLValues + '1' + ', '
        else:
            InsertSQLValues = InsertSQLValues + InsertValue + ', '
    InsertSQLFields = InsertSQLFields[:-2] + ')'
    InsertSQLValues = InsertSQLValues[:-2] + ')'
    InsertSQL = InsertSQLFields + ' ' + InsertSQLValues
    ## print InsertSQL
    cursor.execute(InsertSQL)
    cnxn.commit()
    Number_Processed = Number_Processed + 1
    row = rows.Next()
print('            - Processed all ' + str(Number_Processed))
del row
del rows

I have a python file that creates and populates a table in ms sql. The only sticking point is that the code breaks if there are any non-ascii characters or single apostrophes (and there are quite a few of each). Although I can run the replace function to rid the strings of apostrophes, I would prefer to keep them intact. I have also tried converting the data into utf-8, but no luck there either.

Below are th error messages I get:

"'ascii' codec can't encode character u'\2013' in position..." (for non-ascii characters)

and for the single quotes

class 'pyodbc.ProgrammingError'>: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server] Incorrect syntax near 'S, 230 X 90M.; Eligibilty....  

When I try to encode string in utf-8, I instead get the following error message:

<type 'exceptions.UnicodeDecodeError'>: ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128)

The python code is included below. I believe the point in the code where this break occurs is after the following line: InsertValue = str(row.GetValue(CurrentField['Name'])).

# -*- coding: utf-8 -*-

import pyodbc
import sys
import arcpy
import arcgisscripting

gp = arcgisscripting.create(9.3)
SQL_KEYWORDS = ['PERCENT', 'SELECT', 'INSERT', 'DROP', 'TABLE']

#SourceFGDB = '###'
#SourceTable = '###'
SourceTable = sys.argv[1]
TempInputName = sys.argv[2]
SourceTable2 = sys.argv[3]
#---------------------------------------------------------------------------------------------------------------------
# Target Database Settings
#---------------------------------------------------------------------------------------------------------------------
TargetDatabaseDriver = "{SQL Server}"
TargetDatabaseServer = "###"
TargetDatabaseName = "###"
TargetDatabaseUser = "###"
TargetDatabasePassword = "###"

# Get schema from FGDB table.
# This should be an ordered list of dictionary elements [{'FGDB_Name', 'FGDB_Alias', 'FGDB_Type', FGDB_Width, FGDB_Precision, FGDB_Scale}, {}]

if not gp.Exists(SourceTable):
    print ('- The source does not exist.')
    sys.exit(102)
#### Should see if it is actually a table type.  Could be a Feature Data Set or something...
print('        - Processing Items From : ' + SourceTable)
FieldList = []
Field_List = gp.ListFields(SourceTable)
print('            - Getting number of rows.')
result = gp.GetCount_management(SourceTable)
Number_of_Features = gp.GetCount_management(SourceTable)
print('                - Number of Rows: ' + str(Number_of_Features))
print('            - Getting fields.')
Field_List1 = gp.ListFields(SourceTable, 'Layer')
Field_List2 = gp.ListFields(SourceTable, 'Comments')
Field_List3 = gp.ListFields(SourceTable, 'Category')
Field_List4 = gp.ListFields(SourceTable, 'State')
Field_List5 = gp.ListFields(SourceTable, 'Label')
Field_List6 = gp.ListFields(SourceTable, 'DateUpdate')
Field_List7 = gp.ListFields(SourceTable, 'OBJECTID')
for Current_Field in Field_List1 + Field_List2 + Field_List3 + Field_List4 + Field_List5 + Field_List6 + Field_List7:
        print('            - Field Found: ' + Current_Field.Name)
        if Current_Field.AliasName in SQL_KEYWORDS:
            Target_Name = Current_Field.Name + '_'
        else:
            Target_Name = Current_Field.Name

        print('                 - Alias    : ' + Current_Field.AliasName)
        print('                 - Type     : ' + Current_Field.Type)
        print('                 - Length   : ' + str(Current_Field.Length))
        print('                 - Scale    : ' + str(Current_Field.Scale))
        print('                 - Precision: ' + str(Current_Field.Precision))
        FieldList.append({'Name': Current_Field.Name, 'AliasName': Current_Field.AliasName, 'Type': Current_Field.Type, 'Length': Current_Field.Length, 'Scale': Current_Field.Scale, 'Precision': Current_Field.Precision, 'Unique': 'UNIQUE', 'Target_Name': Target_Name})
# Create table in SQL Server based on FGDB table schema.
cnxn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=###;DATABASE=###;UID=sql_webenvas;PWD=###')
cursor = cnxn .cursor()
#### DROP the table first?
try:
    DropTableSQL = 'DROP TABLE dbo.' + TempInputName + '_Test;'
    print DropTableSQL
    cursor.execute(DropTableSQL)
    dbconnection.commit()
except:
    print('WARNING: Can not drop table - may not exist: ' + TempInputName + '_Test')
CreateTableSQL = ('CREATE TABLE  ' + TempInputName + '_Test '
' (Layer varchar(500), Comments varchar(5000), State int, Label varchar(500), DateUpdate DATETIME, Category varchar(50), OBJECTID int)')
cursor.execute(CreateTableSQL)
cnxn.commit()
# Cursor through each row in the FGDB table, get values, and insert into the SQL Server Table.
# We got Number_of_Features earlier, just use that.
Number_Processed = 0
print('        - Processing ' + str(Number_of_Features) + ' features.')
rows = gp.SearchCursor(SourceTable)
row = rows.Next()
while row:
    if Number_Processed % 10000 == 0:
        print('            - Processed ' + str(Number_Processed) + ' of ' + str(Number_of_Features))
    InsertSQLFields = 'INSERT INTO ' + TempInputName + '_Test ('
    InsertSQLValues = 'VALUES ('
    for CurrentField in FieldList:
        InsertSQLFields = InsertSQLFields + CurrentField['Target_Name'] + ', '
        InsertValue = str(row.GetValue(CurrentField['Name']))
        if InsertValue in ['None']:
            InsertValue = 'NULL'
        # Use an escape quote for the SQL.
        InsertValue = InsertValue.replace("'","' '")
        if CurrentField['Type'].upper() in ['STRING', 'CHAR', 'TEXT']:
            if InsertValue == 'NULL':
                InsertSQLValues = InsertSQLValues + "NULL, "
            else:
                InsertSQLValues = InsertSQLValues + "'" + InsertValue + "', "
        elif CurrentField['Type'].upper() in ['GEOMETRY']:
            ## We're not handling geometry transfers at this time.
            if InsertValue == 'NULL':
                InsertSQLValues = InsertSQLValues + '0' + ', '
            else:
                InsertSQLValues = InsertSQLValues + '1' + ', '
        else:
            InsertSQLValues = InsertSQLValues + InsertValue + ', '
    InsertSQLFields = InsertSQLFields[:-2] + ')'
    InsertSQLValues = InsertSQLValues[:-2] + ')'
    InsertSQL = InsertSQLFields + ' ' + InsertSQLValues
    ## print InsertSQL
    cursor.execute(InsertSQL)
    cnxn.commit()
    Number_Processed = Number_Processed + 1
    row = rows.Next()
print('            - Processed all ' + str(Number_Processed))
del row
del rows

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

滥情空心 2024-12-14 05:44:37

James,我相信真正的问题是你没有全面使用 Unicode。尝试执行以下操作:

  • 确保用于填充数据库的输入文件采用 UTF-8 格式,并且您正在使用 UTF-8 编码器读取它。
  • 确保您的数据库实际上将数据存储为 Unicode
  • 当您从文件或数据库检索数据或想要操作字符串(例如使用 + 运算符)时,您需要确保所有部分都是正确的 Unicode。您不能使用 str() 方法。正如戴夫指出的那样,您需要使用 unicode() 。如果您在代码中定义字符串,请使用 u'my string' 而不是 'my string' (否则它不被视为 unicode)。

另外,请向我们提供完整的堆栈跟踪和异常名称。

James, I believe the real issue is that your are not using Unicode accross the board. Try to do the following:

  • Make sure that your input file that you are using to populate the DB is in UTF-8 and that you are reading it with the UTF-8 encoder.
  • Make sure your DB is actually storing the data as Unicode
  • When you retrieve data from the file or from the DB or want to manipulate strings (with the + operator for instance) you need to make sure that all parts are proper Unicode. You can NOT use the str() method. You need to use unicode() as Dave pointed out. If you define strings in your code use u'my string' instead of 'my string' (otherwise it is not considered unicode).

Also, please provide us the full stack trace and the exception name.

楠木可依 2024-12-14 05:44:37

我将使用我的通灵调试技巧,并说您正在尝试 str() 验证某些内容,但在 ascii 编解码器中出现错误。你真正应该做的是使用 utf-8 编解码器,如下所示:

insert_value_uni = unicode(row.GetValue(CurrentField['Name']))
InsertValue = insert_value_uni.encode('utf-8')

I'm going to use my psychic debugging skills and say you are trying to str()ify something and getting an error with the ascii codec. What you really should do is to use the utf-8 codec instead like this:

insert_value_uni = unicode(row.GetValue(CurrentField['Name']))
InsertValue = insert_value_uni.encode('utf-8')
玩世 2024-12-14 05:44:37

或者,您可以认为只允许使用 ASCII,并使用名为 统一码锤

Or you can take the view that only ASCII is allowed and use the awesomely named Unicode Hammer

傻比既视感 2024-12-14 05:44:37

一般来说,您希望在数据输入时转换为 unicode,并在输出时转换为所需的编码。

因此,如果您这样做,会更容易找到您的问题。这意味着将所有字符串更改为 unicode,将 'INSERT INTO ' 更改为 u'INSERT INTO '。 (注意字符串前面的“u”)
然后,当您发送要执行的字符串时,将其转换为所需的编码“utf8”。

cursor.execute(InsertSQL.encode("utf8")) # Where InsertSQL is unicode

另外,您应该将编码字符串添加到源代码的顶部。
这意味着将编码 cookie 添加到文件的前两行之一:

     #!/usr/bin/python
     # -*- coding: <encoding name> -*-

如果您从文件中提取数据来构建字符串,则可以使用 codecs.open 在加载时自动从特定编码转换为 unicode。

In general you want to convert to unicode on data input, and convert to the desired encoding on output.

So it will be easier to find your problem if you do this. This means changing all strings to unicode, 'INSERT INTO ' to u'INSERT INTO '. (Notice the "u" before the string)
Then when you send the string to be executed convert to the desired encoding, "utf8".

cursor.execute(InsertSQL.encode("utf8")) # Where InsertSQL is unicode

Also, you should add the encoding string to the top of your source code.
This means adding the encoding cookie to one of the first two lines of the file:

     #!/usr/bin/python
     # -*- coding: <encoding name> -*-

If your pulling data from a file to build your string you can uses codecs.open to auto convert from a specific encoding to unicode on load.

夜光 2024-12-14 05:44:37

当我将 str() 转换为 unicode 时,问题就解决了。一个简单的答案,我感谢大家对此的帮助。

When I converted my str() to unicode, that solved the problem. A simple answer, and I appreciate everyone's help on this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文