python3:readlines()索引问题?
Python 3.1.2 (r312:79147, Nov 9 2010, 09:41:54) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6] Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.1/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 2230: unexpected code byte
然而......
Python 2.4.3 (#1, Sep 8 2010, 11:37:47) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6] '2010-06-14 21:14:43 613 xxx.xxx.xxx.xxx 200 TCP_NC_MISS 4198 635 GET http www.thelegendssportscomplex.com 80 /thumbnails/t/sponsors/145x138/007.gif - - - DIRECT www.thelegendssportscomplex.com image/gif http://www.thelegendssportscomplex.com/ "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8)" OBSERVED "Sports/Recreation" - xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx\r\n'
有谁知道为什么 .readlines()[6] 不适用于 python-3 但可以在 2.4 中工作?
另外...我以为 0xAE 是 ®
Python 3.1.2 (r312:79147, Nov 9 2010, 09:41:54) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6] Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.1/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 2230: unexpected code byte
and yet...
Python 2.4.3 (#1, Sep 8 2010, 11:37:47) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6] '2010-06-14 21:14:43 613 xxx.xxx.xxx.xxx 200 TCP_NC_MISS 4198 635 GET http www.thelegendssportscomplex.com 80 /thumbnails/t/sponsors/145x138/007.gif - - - DIRECT www.thelegendssportscomplex.com image/gif http://www.thelegendssportscomplex.com/ "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8)" OBSERVED "Sports/Recreation" - xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx\r\n'
does anyone have any idea why .readlines()[6] doesn't work for python-3 but does work in 2.4?
also... I thought 0xAE was ®
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
来自 Python wiki:
看起来好像您的编码与您想象的不同。
From the Python wiki:
It appears as though you have a different encoding than you think you do.
open 函数文档:
永远使用编码读取文件:
忽略解码错误?设置错误='忽略'。 'errors' 的默认值为 'None',与 'strict' 相同。
open function doc:
reading files using encoding for ever:
ignore decoding error? Setting the errors='ignore'. The default value of 'errors' is 'None', same with 'strict'.
由于距离提出这个问题已有大约两年的时间,您可能已经知道原因了。基本上,Python 3 字符串是 Unicode 字符串。为了使它们抽象,您需要告诉 Python 文件使用什么编码。
Python 2 字符串实际上是字节序列,Python 可以很好地从文件中读取任何字节。一些字符被解释(换行符、制表符……),但其余字符保持不变。
Python 3
open()
与 Python 2codecs.open()
类似。...是时候了...通过接受答案之一来结束问题。
As it is about two years from asking the question, you probably already know the reason. Basically, Python 3 strings are Unicode strings. To make them abstract you need to tell Python what encoding is used for the file.
Python 2 strings are actually byte sequences and Python feels fine to read whatever bytes from the file. Some of the characters are interpreted (newlines, tabs,...), but the rest is left untouched.
Python 3
open()
is similar to Python 2codecs.open()
.... the time has come ... to close the question by accepting one of the answers.