Python 非 ASCII 字符
我有一个 python 文件,它在 ms sql 中创建并填充一个表。唯一的症结在于,如果存在任何非 ascii 字符或单个撇号(而且每种撇号都有相当多),代码就会中断。虽然我可以运行替换函数来删除撇号字符串,但我更愿意保持它们完整。我也尝试过将数据转换为 utf-8,但也没有成功。
以下是我收到的错误消息:
"'ascii' codec can't encode character u'\2013' in position..." (for non-ascii characters)
对于单引号
class 'pyodbc.ProgrammingError'>: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server] Incorrect syntax near 'S, 230 X 90M.; Eligibilty....
,当我尝试用 utf-8 编码字符串时,我收到以下错误消息:
<type 'exceptions.UnicodeDecodeError'>: ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128)
下面包含 python 代码。我相信代码中发生此中断的点位于以下行之后:InsertValue = str(row.GetValue(CurrentField['Name']))。
# -*- coding: utf-8 -*-
import pyodbc
import sys
import arcpy
import arcgisscripting
gp = arcgisscripting.create(9.3)
SQL_KEYWORDS = ['PERCENT', 'SELECT', 'INSERT', 'DROP', 'TABLE']
#SourceFGDB = '###'
#SourceTable = '###'
SourceTable = sys.argv[1]
TempInputName = sys.argv[2]
SourceTable2 = sys.argv[3]
#---------------------------------------------------------------------------------------------------------------------
# Target Database Settings
#---------------------------------------------------------------------------------------------------------------------
TargetDatabaseDriver = "{SQL Server}"
TargetDatabaseServer = "###"
TargetDatabaseName = "###"
TargetDatabaseUser = "###"
TargetDatabasePassword = "###"
# Get schema from FGDB table.
# This should be an ordered list of dictionary elements [{'FGDB_Name', 'FGDB_Alias', 'FGDB_Type', FGDB_Width, FGDB_Precision, FGDB_Scale}, {}]
if not gp.Exists(SourceTable):
print ('- The source does not exist.')
sys.exit(102)
#### Should see if it is actually a table type. Could be a Feature Data Set or something...
print(' - Processing Items From : ' + SourceTable)
FieldList = []
Field_List = gp.ListFields(SourceTable)
print(' - Getting number of rows.')
result = gp.GetCount_management(SourceTable)
Number_of_Features = gp.GetCount_management(SourceTable)
print(' - Number of Rows: ' + str(Number_of_Features))
print(' - Getting fields.')
Field_List1 = gp.ListFields(SourceTable, 'Layer')
Field_List2 = gp.ListFields(SourceTable, 'Comments')
Field_List3 = gp.ListFields(SourceTable, 'Category')
Field_List4 = gp.ListFields(SourceTable, 'State')
Field_List5 = gp.ListFields(SourceTable, 'Label')
Field_List6 = gp.ListFields(SourceTable, 'DateUpdate')
Field_List7 = gp.ListFields(SourceTable, 'OBJECTID')
for Current_Field in Field_List1 + Field_List2 + Field_List3 + Field_List4 + Field_List5 + Field_List6 + Field_List7:
print(' - Field Found: ' + Current_Field.Name)
if Current_Field.AliasName in SQL_KEYWORDS:
Target_Name = Current_Field.Name + '_'
else:
Target_Name = Current_Field.Name
print(' - Alias : ' + Current_Field.AliasName)
print(' - Type : ' + Current_Field.Type)
print(' - Length : ' + str(Current_Field.Length))
print(' - Scale : ' + str(Current_Field.Scale))
print(' - Precision: ' + str(Current_Field.Precision))
FieldList.append({'Name': Current_Field.Name, 'AliasName': Current_Field.AliasName, 'Type': Current_Field.Type, 'Length': Current_Field.Length, 'Scale': Current_Field.Scale, 'Precision': Current_Field.Precision, 'Unique': 'UNIQUE', 'Target_Name': Target_Name})
# Create table in SQL Server based on FGDB table schema.
cnxn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=###;DATABASE=###;UID=sql_webenvas;PWD=###')
cursor = cnxn .cursor()
#### DROP the table first?
try:
DropTableSQL = 'DROP TABLE dbo.' + TempInputName + '_Test;'
print DropTableSQL
cursor.execute(DropTableSQL)
dbconnection.commit()
except:
print('WARNING: Can not drop table - may not exist: ' + TempInputName + '_Test')
CreateTableSQL = ('CREATE TABLE ' + TempInputName + '_Test '
' (Layer varchar(500), Comments varchar(5000), State int, Label varchar(500), DateUpdate DATETIME, Category varchar(50), OBJECTID int)')
cursor.execute(CreateTableSQL)
cnxn.commit()
# Cursor through each row in the FGDB table, get values, and insert into the SQL Server Table.
# We got Number_of_Features earlier, just use that.
Number_Processed = 0
print(' - Processing ' + str(Number_of_Features) + ' features.')
rows = gp.SearchCursor(SourceTable)
row = rows.Next()
while row:
if Number_Processed % 10000 == 0:
print(' - Processed ' + str(Number_Processed) + ' of ' + str(Number_of_Features))
InsertSQLFields = 'INSERT INTO ' + TempInputName + '_Test ('
InsertSQLValues = 'VALUES ('
for CurrentField in FieldList:
InsertSQLFields = InsertSQLFields + CurrentField['Target_Name'] + ', '
InsertValue = str(row.GetValue(CurrentField['Name']))
if InsertValue in ['None']:
InsertValue = 'NULL'
# Use an escape quote for the SQL.
InsertValue = InsertValue.replace("'","' '")
if CurrentField['Type'].upper() in ['STRING', 'CHAR', 'TEXT']:
if InsertValue == 'NULL':
InsertSQLValues = InsertSQLValues + "NULL, "
else:
InsertSQLValues = InsertSQLValues + "'" + InsertValue + "', "
elif CurrentField['Type'].upper() in ['GEOMETRY']:
## We're not handling geometry transfers at this time.
if InsertValue == 'NULL':
InsertSQLValues = InsertSQLValues + '0' + ', '
else:
InsertSQLValues = InsertSQLValues + '1' + ', '
else:
InsertSQLValues = InsertSQLValues + InsertValue + ', '
InsertSQLFields = InsertSQLFields[:-2] + ')'
InsertSQLValues = InsertSQLValues[:-2] + ')'
InsertSQL = InsertSQLFields + ' ' + InsertSQLValues
## print InsertSQL
cursor.execute(InsertSQL)
cnxn.commit()
Number_Processed = Number_Processed + 1
row = rows.Next()
print(' - Processed all ' + str(Number_Processed))
del row
del rows
I have a python file that creates and populates a table in ms sql. The only sticking point is that the code breaks if there are any non-ascii characters or single apostrophes (and there are quite a few of each). Although I can run the replace function to rid the strings of apostrophes, I would prefer to keep them intact. I have also tried converting the data into utf-8, but no luck there either.
Below are th error messages I get:
"'ascii' codec can't encode character u'\2013' in position..." (for non-ascii characters)
and for the single quotes
class 'pyodbc.ProgrammingError'>: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server] Incorrect syntax near 'S, 230 X 90M.; Eligibilty....
When I try to encode string in utf-8, I instead get the following error message:
<type 'exceptions.UnicodeDecodeError'>: ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128)
The python code is included below. I believe the point in the code where this break occurs is after the following line: InsertValue = str(row.GetValue(CurrentField['Name'])).
# -*- coding: utf-8 -*-
import pyodbc
import sys
import arcpy
import arcgisscripting
gp = arcgisscripting.create(9.3)
SQL_KEYWORDS = ['PERCENT', 'SELECT', 'INSERT', 'DROP', 'TABLE']
#SourceFGDB = '###'
#SourceTable = '###'
SourceTable = sys.argv[1]
TempInputName = sys.argv[2]
SourceTable2 = sys.argv[3]
#---------------------------------------------------------------------------------------------------------------------
# Target Database Settings
#---------------------------------------------------------------------------------------------------------------------
TargetDatabaseDriver = "{SQL Server}"
TargetDatabaseServer = "###"
TargetDatabaseName = "###"
TargetDatabaseUser = "###"
TargetDatabasePassword = "###"
# Get schema from FGDB table.
# This should be an ordered list of dictionary elements [{'FGDB_Name', 'FGDB_Alias', 'FGDB_Type', FGDB_Width, FGDB_Precision, FGDB_Scale}, {}]
if not gp.Exists(SourceTable):
print ('- The source does not exist.')
sys.exit(102)
#### Should see if it is actually a table type. Could be a Feature Data Set or something...
print(' - Processing Items From : ' + SourceTable)
FieldList = []
Field_List = gp.ListFields(SourceTable)
print(' - Getting number of rows.')
result = gp.GetCount_management(SourceTable)
Number_of_Features = gp.GetCount_management(SourceTable)
print(' - Number of Rows: ' + str(Number_of_Features))
print(' - Getting fields.')
Field_List1 = gp.ListFields(SourceTable, 'Layer')
Field_List2 = gp.ListFields(SourceTable, 'Comments')
Field_List3 = gp.ListFields(SourceTable, 'Category')
Field_List4 = gp.ListFields(SourceTable, 'State')
Field_List5 = gp.ListFields(SourceTable, 'Label')
Field_List6 = gp.ListFields(SourceTable, 'DateUpdate')
Field_List7 = gp.ListFields(SourceTable, 'OBJECTID')
for Current_Field in Field_List1 + Field_List2 + Field_List3 + Field_List4 + Field_List5 + Field_List6 + Field_List7:
print(' - Field Found: ' + Current_Field.Name)
if Current_Field.AliasName in SQL_KEYWORDS:
Target_Name = Current_Field.Name + '_'
else:
Target_Name = Current_Field.Name
print(' - Alias : ' + Current_Field.AliasName)
print(' - Type : ' + Current_Field.Type)
print(' - Length : ' + str(Current_Field.Length))
print(' - Scale : ' + str(Current_Field.Scale))
print(' - Precision: ' + str(Current_Field.Precision))
FieldList.append({'Name': Current_Field.Name, 'AliasName': Current_Field.AliasName, 'Type': Current_Field.Type, 'Length': Current_Field.Length, 'Scale': Current_Field.Scale, 'Precision': Current_Field.Precision, 'Unique': 'UNIQUE', 'Target_Name': Target_Name})
# Create table in SQL Server based on FGDB table schema.
cnxn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=###;DATABASE=###;UID=sql_webenvas;PWD=###')
cursor = cnxn .cursor()
#### DROP the table first?
try:
DropTableSQL = 'DROP TABLE dbo.' + TempInputName + '_Test;'
print DropTableSQL
cursor.execute(DropTableSQL)
dbconnection.commit()
except:
print('WARNING: Can not drop table - may not exist: ' + TempInputName + '_Test')
CreateTableSQL = ('CREATE TABLE ' + TempInputName + '_Test '
' (Layer varchar(500), Comments varchar(5000), State int, Label varchar(500), DateUpdate DATETIME, Category varchar(50), OBJECTID int)')
cursor.execute(CreateTableSQL)
cnxn.commit()
# Cursor through each row in the FGDB table, get values, and insert into the SQL Server Table.
# We got Number_of_Features earlier, just use that.
Number_Processed = 0
print(' - Processing ' + str(Number_of_Features) + ' features.')
rows = gp.SearchCursor(SourceTable)
row = rows.Next()
while row:
if Number_Processed % 10000 == 0:
print(' - Processed ' + str(Number_Processed) + ' of ' + str(Number_of_Features))
InsertSQLFields = 'INSERT INTO ' + TempInputName + '_Test ('
InsertSQLValues = 'VALUES ('
for CurrentField in FieldList:
InsertSQLFields = InsertSQLFields + CurrentField['Target_Name'] + ', '
InsertValue = str(row.GetValue(CurrentField['Name']))
if InsertValue in ['None']:
InsertValue = 'NULL'
# Use an escape quote for the SQL.
InsertValue = InsertValue.replace("'","' '")
if CurrentField['Type'].upper() in ['STRING', 'CHAR', 'TEXT']:
if InsertValue == 'NULL':
InsertSQLValues = InsertSQLValues + "NULL, "
else:
InsertSQLValues = InsertSQLValues + "'" + InsertValue + "', "
elif CurrentField['Type'].upper() in ['GEOMETRY']:
## We're not handling geometry transfers at this time.
if InsertValue == 'NULL':
InsertSQLValues = InsertSQLValues + '0' + ', '
else:
InsertSQLValues = InsertSQLValues + '1' + ', '
else:
InsertSQLValues = InsertSQLValues + InsertValue + ', '
InsertSQLFields = InsertSQLFields[:-2] + ')'
InsertSQLValues = InsertSQLValues[:-2] + ')'
InsertSQL = InsertSQLFields + ' ' + InsertSQLValues
## print InsertSQL
cursor.execute(InsertSQL)
cnxn.commit()
Number_Processed = Number_Processed + 1
row = rows.Next()
print(' - Processed all ' + str(Number_Processed))
del row
del rows
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
James,我相信真正的问题是你没有全面使用 Unicode。尝试执行以下操作:
另外,请向我们提供完整的堆栈跟踪和异常名称。
James, I believe the real issue is that your are not using Unicode accross the board. Try to do the following:
Also, please provide us the full stack trace and the exception name.
我将使用我的通灵调试技巧,并说您正在尝试
str()
验证某些内容,但在 ascii 编解码器中出现错误。你真正应该做的是使用 utf-8 编解码器,如下所示:I'm going to use my psychic debugging skills and say you are trying to
str()
ify something and getting an error with the ascii codec. What you really should do is to use the utf-8 codec instead like this:或者,您可以认为只允许使用 ASCII,并使用名为 统一码锤
Or you can take the view that only ASCII is allowed and use the awesomely named Unicode Hammer
一般来说,您希望在数据输入时转换为 unicode,并在输出时转换为所需的编码。
因此,如果您这样做,会更容易找到您的问题。这意味着将所有字符串更改为 unicode,将 'INSERT INTO ' 更改为 u'INSERT INTO '。 (注意字符串前面的“u”)
然后,当您发送要执行的字符串时,将其转换为所需的编码“utf8”。
另外,您应该将编码字符串添加到源代码的顶部。
这意味着将编码 cookie 添加到文件的前两行之一:
如果您从文件中提取数据来构建字符串,则可以使用 codecs.open 在加载时自动从特定编码转换为 unicode。
In general you want to convert to unicode on data input, and convert to the desired encoding on output.
So it will be easier to find your problem if you do this. This means changing all strings to unicode, 'INSERT INTO ' to u'INSERT INTO '. (Notice the "u" before the string)
Then when you send the string to be executed convert to the desired encoding, "utf8".
Also, you should add the encoding string to the top of your source code.
This means adding the encoding cookie to one of the first two lines of the file:
If your pulling data from a file to build your string you can uses codecs.open to auto convert from a specific encoding to unicode on load.
当我将 str() 转换为 unicode 时,问题就解决了。一个简单的答案,我感谢大家对此的帮助。
When I converted my str() to unicode, that solved the problem. A simple answer, and I appreciate everyone's help on this.