python3:readlines()索引问题?

发布于 2024-10-01 09:34:50 字数 1319 浏览 5 评论 0原文

Python 3.1.2 (r312:79147, Nov  9 2010, 09:41:54)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6]
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 2230: unexpected code byte

然而......

Python 2.4.3 (#1, Sep  8 2010, 11:37:47)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6]
'2010-06-14 21:14:43 613 xxx.xxx.xxx.xxx 200 TCP_NC_MISS 4198 635 GET http www.thelegendssportscomplex.com 80 /thumbnails/t/sponsors/145x138/007.gif - - - DIRECT www.thelegendssportscomplex.com image/gif http://www.thelegendssportscomplex.com/ "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8)" OBSERVED "Sports/Recreation" - xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx\r\n'

有谁知道为什么 .readlines()[6] 不适用于 python-3 但可以在 2.4 中工作?

另外...我以为 0xAE 是 ®

Python 3.1.2 (r312:79147, Nov  9 2010, 09:41:54)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6]
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 2230: unexpected code byte

and yet...

Python 2.4.3 (#1, Sep  8 2010, 11:37:47)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6]
'2010-06-14 21:14:43 613 xxx.xxx.xxx.xxx 200 TCP_NC_MISS 4198 635 GET http www.thelegendssportscomplex.com 80 /thumbnails/t/sponsors/145x138/007.gif - - - DIRECT www.thelegendssportscomplex.com image/gif http://www.thelegendssportscomplex.com/ "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8)" OBSERVED "Sports/Recreation" - xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx\r\n'

does anyone have any idea why .readlines()[6] doesn't work for python-3 but does work in 2.4?

also... I thought 0xAE was ®

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

感情旳空白 2024-10-08 09:34:50

来自 Python wiki

UnicodeDecodeError 通常在从某种编码解码 str 字符串时发生。由于编码仅将有限数量的 str 字符串映射到 unicode 字符,因此非法的 str 字符序列将导致特定于编码的decode()失败

看起来好像您的编码与您想象的不同。

From the Python wiki:

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail

It appears as though you have a different encoding than you think you do.

他不在意 2024-10-08 09:34:50

open 函数文档:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

永远使用编码读取文件:

open("/home/madsc13ntist/test_file.txt", "r",encoding='iso8859-1').readlines()[6]

忽略解码错误?设置错误='忽略'。 'errors' 的默认值为 'None',与 'strict' 相同。

open function doc:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

reading files using encoding for ever:

open("/home/madsc13ntist/test_file.txt", "r",encoding='iso8859-1').readlines()[6]

ignore decoding error? Setting the errors='ignore'. The default value of 'errors' is 'None', same with 'strict'.

瘫痪情歌 2024-10-08 09:34:50

由于距离提出这个问题已有大约两年的时间,您可能已经知道原因了。基本上,Python 3 字符串是 Unicode 字符串。为了使它们抽象,您需要告诉 Python 文件使用什么编码。

Python 2 字符串实际上是字节序列,Python 可以很好地从文件中读取任何字节。一些字符被解释(换行符、制表符……),但其余字符保持不变。

Python 3 open() 与 Python 2 codecs.open() 类似。

...是时候了...通过接受答案之一来结束问题。

As it is about two years from asking the question, you probably already know the reason. Basically, Python 3 strings are Unicode strings. To make them abstract you need to tell Python what encoding is used for the file.

Python 2 strings are actually byte sequences and Python feels fine to read whatever bytes from the file. Some of the characters are interpreted (newlines, tabs,...), but the rest is left untouched.

Python 3 open() is similar to Python 2 codecs.open().

... the time has come ... to close the question by accepting one of the answers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文