Python读取csv问题:额外的空格
当我尝试解析由 MS SQL 2005 Express 版本的查询导出的 csv 时,python 给出的字符串完全出乎我的意料。例如,如果 csv 文件中的行是:" aaa,bbb,ccc,dddd”,然后当 python 将其解析为字符串时,它变成:“aaaa , bbb , ccc, ddd d”之类的东西......会发生什么???
我尝试删除代码中的空格但不起作用。
import os
import random
f1 = open('a.txt', 'r')
f2 = open('dec_sql.txt', 'w')
text = 'abc'
while(text != ''):
text = f1.readline()
if(text==''):
break
splited = text.split(',')
for i in range (0, 32):
splited[i] = splited[i].replace(' ', '')
sql = 'insert into dbo.INBOUND_RATED_DEC2010 values ('
sql += '\'' + splited[0] + '\', '
sql += '\'' + splited[1] + '\', '
sql += '\'' + splited[2] + '\', '
sql += '\'' + splited[3] + '\', '
sql += '\'' + splited[4] + '\', '
sql += '\'' + splited[5] + '\', '
sql += '\'' + splited[6] + '\', '
sql += '\'' + splited[7] + '\', '
sql += '\'' + splited[8] + '\', '
sql += '\'' + splited[9] + '\', '
sql += '\'' + splited[10] + '\', '
sql += '\'' + splited[11] + '\', '
sql += '\'' + splited[12] + '\', '
sql += '\'' + splited[13] + '\', '
sql += '\'' + splited[14] + '\', '
sql += '\'' + splited[15] + '\', '
sql += '\'' + splited[16] + '\', '
sql += '\'' + splited[17] + '\', '
sql += '\'' + splited[18] + '\', '
sql += '\'' + splited[19] + '\', '
sql += '\'' + splited[20] + '\', '
sql += '\'' + splited[21] + '\', '
sql += '\'' + splited[22] + '\', '
sql += '\'' + splited[23] + '\', '
sql += '\'' + splited[24] + '\', '
sql += '\'' + splited[25] + '\', '
sql += '\'' + splited[26] + '\', '
sql += '\'' + splited[27] + '\', '
sql += '\'' + splited[28] + '\', '
sql += '\'' + splited[29] + '\', '
sql += '\'' + splited[30] + '\', '
sql += '\'' + splited[31] + '\', '
sql += '\'' + splited[32] + '\' '
sql += ')'
print sql
f2.write(sql+'\n')
f2.close()
f1.close()
When I tried to parse a csv which was exported by MS SQL 2005 express edition's query, the string python gives me is totally unexpected. For example if the line in the csv file is :"
aaa,bbb,ccc,dddd", then when python parsed it as string, it becomes :" a a a a , b b b , c c c, d d d d" something like that.....What happens???
I tried to remove the space in the code but don't work.
import os
import random
f1 = open('a.txt', 'r')
f2 = open('dec_sql.txt', 'w')
text = 'abc'
while(text != ''):
text = f1.readline()
if(text==''):
break
splited = text.split(',')
for i in range (0, 32):
splited[i] = splited[i].replace(' ', '')
sql = 'insert into dbo.INBOUND_RATED_DEC2010 values ('
sql += '\'' + splited[0] + '\', '
sql += '\'' + splited[1] + '\', '
sql += '\'' + splited[2] + '\', '
sql += '\'' + splited[3] + '\', '
sql += '\'' + splited[4] + '\', '
sql += '\'' + splited[5] + '\', '
sql += '\'' + splited[6] + '\', '
sql += '\'' + splited[7] + '\', '
sql += '\'' + splited[8] + '\', '
sql += '\'' + splited[9] + '\', '
sql += '\'' + splited[10] + '\', '
sql += '\'' + splited[11] + '\', '
sql += '\'' + splited[12] + '\', '
sql += '\'' + splited[13] + '\', '
sql += '\'' + splited[14] + '\', '
sql += '\'' + splited[15] + '\', '
sql += '\'' + splited[16] + '\', '
sql += '\'' + splited[17] + '\', '
sql += '\'' + splited[18] + '\', '
sql += '\'' + splited[19] + '\', '
sql += '\'' + splited[20] + '\', '
sql += '\'' + splited[21] + '\', '
sql += '\'' + splited[22] + '\', '
sql += '\'' + splited[23] + '\', '
sql += '\'' + splited[24] + '\', '
sql += '\'' + splited[25] + '\', '
sql += '\'' + splited[26] + '\', '
sql += '\'' + splited[27] + '\', '
sql += '\'' + splited[28] + '\', '
sql += '\'' + splited[29] + '\', '
sql += '\'' + splited[30] + '\', '
sql += '\'' + splited[31] + '\', '
sql += '\'' + splited[32] + '\' '
sql += ')'
print sql
f2.write(sql+'\n')
f2.close()
f1.close()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在我看来,MS SQL 2005 查询的输出是一个 unicode 文件。 python csv 模块 无法处理unicode文件,但是有一些csv 模块文档中的示例代码 描述了如何解决该问题。
或者,某些文本编辑器允许您使用不同的编码保存文件。例如,我在 Notepad++ 中打开 MS SQL 2005 查询的结果,它告诉我该文件是 UCS-2 编码的,我可以从“编码”菜单将其转换为 UTF-8。
Sounds to me like the output of the MS SQL 2005 query is a unicode file. The python csv module cannot handle unicode files, but there is some sample code in the documentation for the csv module describing how to work around the problem.
Alternately, some text editors allow you to save a file with a different encoding. For example, I opened the results of a MS SQL 2005 query in Notepad++ and it told me the file was UCS-2 encoded and I was able to convert it to UTF-8 from the Encoding menu.
尝试用记事本打开该文件,并使用替换全部功能将
' '
替换为''
Try to open the file in notepad and use the replace all function to replace
' '
with''
您的文件很可能是用 2 字节字符编码进行编码的 - 最有可能是 utf-16 (但它可能是其他编码。
为了让 CSV 正确读取它,您可以使用编解码器打开它,以便在读取时对其进行解码- 这样做你的 python 程序中有 Unicode 对象(不是字符串对象)
所以,不要使用 Use 打开文件
:
导入编解码器
然后将其提供给 CSV 模块,使用:
import csv
读者 = csv.reader(my_file)
第一行=假
对于读卡器中的行:
if first_line: #跳过标题行
第一行=真
继续
#assemble sql 查询并发出它
另一件事是,在编程时将“查询”构造成 32 行重复代码是一件好事。即使在缺乏丰富字符串处理功能的语言中,也有更好的方法来做到这一点,但在 Python 中,您可以简单地执行以下操作:
而不是用那 33 行来组装查询。 (我告诉它在里面插入一个字符串
第一个字符串上的括号。在
%
运算符之后,字符串“,”与“join”方法一起使用,以便将作为参数传递给 join 的序列上的所有元素粘贴在一起。该序列由一个字符串组成,其中包含一个用单引号括起来的值,用于分割数组中的每个值。Your file is most likely encoded with a 2byte character encoding - most likely utf-16 (but it culd be some other encoding.
To get the CSV proper reading it, you'd open it with a codec so that it is decoded as its read - doing that you have Unicode objects (not string objects) inside your python program.
So, instead of opening the file with
Use:
import codecs
And then feed this to the CSV module, with:
import csv
reader = csv.reader(my_file)
first_line = False
for line in reader:
if first_line: #skips header line
first_line = True
continue
#assemble sql query and issue it
Another thing is that your "query" being constructed into 32 lines of repetitive code is a nice thing to do when programing. Even in languages that lack rich string processing facilities, there are better ways to do it, but in Python, you can simply do:
Instead of those 33 lines assembling your query. (I am telling it to insert a string inside
the parentheses on the first string. After the
%
operator, the string ", " is used with the "join" method so that it is used to paste together all elements on the sequence passed as a parameter to join. This sequence is made of a string, containing a value enclosed inside single quotes for each value in your splited array.使用 Python 内置的 CSV 阅读器可能会有所帮助。看起来像是 unicode 的问题,这个问题让我很沮丧。
It may help to use Python's built in CSV reader. Looks like an issue with unicode, a problem that frustrated me a lot.