目录内部解析问题 Python 2.7 与 3.2

发布于 2024-11-08 17:20:53 字数 1500 浏览 4 评论 0原文

我正在尝试在 Python 3 的目录中进行一些基本的文件解析。此代码在 Python 2.7 中完美运行,但我无法弄清楚 Python 3.2 中的问题出在哪里。

import sys, os, re

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
os.chdir('/Users/sbrown/Desktop/Test') 
for file in filelist:
    infile = open(file, mode='r') 
    filestring = infile.read() 
    infile.close() 
    pattern = re.compile('exit') 
    filestring = pattern.sub('so long', filestring) 
    outfile = open(file, mode='w') 
    outfile.write(filestring)
    outfile.close 
exit

这是抛出的错误:

Traceback (most recent call last):
  File "/Users/bunsen/Desktop/parser.py", line 9, in <module>
      filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

我正在解析的文件都是文本文件。我尝试将方法参数中的编码指定为 utf-8,但这不起作用。有什么想法吗?提前致谢!

如果我将编码指定为 utf-8,则会抛出以下错误:

Traceback (most recent call last):
  File "/Users/sbrown/Desktop/parser.py", line 9, in <module>
    filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

I am trying to do some basic file parsing within a directory in Python 3. This code works perfectly in Python 2.7, but I can not figure out what the problem is in Python 3.2.

import sys, os, re

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
os.chdir('/Users/sbrown/Desktop/Test') 
for file in filelist:
    infile = open(file, mode='r') 
    filestring = infile.read() 
    infile.close() 
    pattern = re.compile('exit') 
    filestring = pattern.sub('so long', filestring) 
    outfile = open(file, mode='w') 
    outfile.write(filestring)
    outfile.close 
exit

This is the error that is thrown back:

Traceback (most recent call last):
  File "/Users/bunsen/Desktop/parser.py", line 9, in <module>
      filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

The files I am parsing are all text files. I tried specifying the encoding in the method arguments to utf-8, but that didn't work. Any ideas? Thanks in advance!

If I specify the encoding as utf-8, here is the error that is thrown:

Traceback (most recent call last):
  File "/Users/sbrown/Desktop/parser.py", line 9, in <module>
    filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

苍暮颜 2024-11-15 17:20:53

当您打开文件时,您没有指定编码。您需要在 Python 3 中执行此操作,因为在 Python 3 中,文本模式文件将返回解码后的 Unicode 字符串。

现在您尝试使用 UTF-8,但这不起作用,所以显然,这不是使用的编码。只有您知道它是什么编码,但我猜测它是 cp1252,因为 0x80 是该代码页的 € 字符,因此当您有欧洲 Windows 用户时,0x80 失败很常见。 :-)

为了与 Python 2.7 和 3.1 兼容,我建议您使用 io 库来打开文件。这是 Python 3 中默认使用的,并且在 Python 2.6 及更高版本中也可用:

import io
infile = io.open(filelist[0], mode='rt', encoding='cp1252')

You are not specifying an encoding when you open your files. You need to do that in Python 3, as in Python 3 a text mode file will return decoded Unicode strings.

Now you tried with UTF-8, and that didn't work, so obviously, that isn't the encoding used. Only you know what encoding it is, but I'm guessing it's cp1252, as 0x80 is that code page's character for €, so failing on 0x80 is common when you have European Windows users. :-)

To be compatible with Python 2.7 and 3.1 I recommend you use the io library to open files. That is the one used in Python 3 by default, and it's available in Python 2.6 and later as well:

import io
infile = io.open(filelist[0], mode='rt', encoding='cp1252')
∞琼窗梦回ˉ 2024-11-15 17:20:53

这有效吗?

import codecs
infile = codecs.open(filelist[0], encoding='UTF-8')
infile.read()

Does this work?

import codecs
infile = codecs.open(filelist[0], encoding='UTF-8')
infile.read()
酸甜透明夹心 2024-11-15 17:20:53

进行测试

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
infile = open(filelist[0], mode='r') 
print(infile.encoding)

以确保您以 utf-8 格式读取文件。如果没有,请检查您是否没有对编解码器做过任何邪恶的事情。您还可以使用强制 utf-8 发布测试跟踪吗?

Test

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
infile = open(filelist[0], mode='r') 
print(infile.encoding)

to be sure that you read your files in utf-8. If not, check if you haven't done something wicked with codecs. Also could you post the trace for your test with forced utf-8?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文