目录内部解析问题 Python 2.7 与 3.2
我正在尝试在 Python 3 的目录中进行一些基本的文件解析。此代码在 Python 2.7 中完美运行,但我无法弄清楚 Python 3.2 中的问题出在哪里。
import sys, os, re
filelist = os.listdir('/Users/sbrown/Desktop/Test')
os.chdir('/Users/sbrown/Desktop/Test')
for file in filelist:
infile = open(file, mode='r')
filestring = infile.read()
infile.close()
pattern = re.compile('exit')
filestring = pattern.sub('so long', filestring)
outfile = open(file, mode='w')
outfile.write(filestring)
outfile.close
exit
这是抛出的错误:
Traceback (most recent call last):
File "/Users/bunsen/Desktop/parser.py", line 9, in <module>
filestring = infile.read()
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`
我正在解析的文件都是文本文件。我尝试将方法参数中的编码指定为 utf-8,但这不起作用。有什么想法吗?提前致谢!
如果我将编码指定为 utf-8,则会抛出以下错误:
Traceback (most recent call last):
File "/Users/sbrown/Desktop/parser.py", line 9, in <module>
filestring = infile.read()
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`
I am trying to do some basic file parsing within a directory in Python 3. This code works perfectly in Python 2.7, but I can not figure out what the problem is in Python 3.2.
import sys, os, re
filelist = os.listdir('/Users/sbrown/Desktop/Test')
os.chdir('/Users/sbrown/Desktop/Test')
for file in filelist:
infile = open(file, mode='r')
filestring = infile.read()
infile.close()
pattern = re.compile('exit')
filestring = pattern.sub('so long', filestring)
outfile = open(file, mode='w')
outfile.write(filestring)
outfile.close
exit
This is the error that is thrown back:
Traceback (most recent call last):
File "/Users/bunsen/Desktop/parser.py", line 9, in <module>
filestring = infile.read()
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`
The files I am parsing are all text files. I tried specifying the encoding in the method arguments to utf-8, but that didn't work. Any ideas? Thanks in advance!
If I specify the encoding as utf-8, here is the error that is thrown:
Traceback (most recent call last):
File "/Users/sbrown/Desktop/parser.py", line 9, in <module>
filestring = infile.read()
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当您打开文件时,您没有指定编码。您需要在 Python 3 中执行此操作,因为在 Python 3 中,文本模式文件将返回解码后的 Unicode 字符串。
现在您尝试使用 UTF-8,但这不起作用,所以显然,这不是使用的编码。只有您知道它是什么编码,但我猜测它是 cp1252,因为 0x80 是该代码页的 € 字符,因此当您有欧洲 Windows 用户时,0x80 失败很常见。 :-)
为了与 Python 2.7 和 3.1 兼容,我建议您使用 io 库来打开文件。这是 Python 3 中默认使用的,并且在 Python 2.6 及更高版本中也可用:
You are not specifying an encoding when you open your files. You need to do that in Python 3, as in Python 3 a text mode file will return decoded Unicode strings.
Now you tried with UTF-8, and that didn't work, so obviously, that isn't the encoding used. Only you know what encoding it is, but I'm guessing it's cp1252, as 0x80 is that code page's character for €, so failing on 0x80 is common when you have European Windows users. :-)
To be compatible with Python 2.7 and 3.1 I recommend you use the io library to open files. That is the one used in Python 3 by default, and it's available in Python 2.6 and later as well:
这有效吗?
Does this work?
进行测试
以确保您以
utf-8
格式读取文件。如果没有,请检查您是否没有对编解码器做过任何邪恶的事情。您还可以使用强制utf-8
发布测试跟踪吗?Test
to be sure that you read your files in
utf-8
. If not, check if you haven't done something wicked withcodecs
. Also could you post the trace for your test with forcedutf-8
?