Python - Python 3.1 似乎无法处理 UTF-16 编码的文件?
我正在尝试运行一些代码来简单地浏览一堆文件并将那些恰好是 .txt 文件的文件写入同一个文件中,删除所有空格。这是一些应该可以解决问题的简单代码:
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if '.txt' in file:
f = open(subdir+'/'+file, 'r')
line = f.readline()
while line:
line2 = line.split()
if line2:
output_file.write(" ".join(line2)+'\n')
line = f.readline()
f.close()
但是,我收到以下错误:
文件“/usr/lib/python3.1/codecs.py”,第 300 行,在解码中 (结果,消耗)= self._buffer_decode(数据,self.errors,最终) UnicodeDecodeError: 'utf8' 编解码器无法解码位置 0 中的字节 0xfe:意外的代码字节
事实证明,这些 .txt 文件都是 UTF-16 格式的(无论如何,根据 FireFox 的说法)。我认为 Python 3.x 应该能够处理任何类型的字符编码?
最好的, 乔治娜
I'm trying to run some code to simply go through a bunch of files and write those that happen to be .txt files into the same file, removing all the spaces. Here's some simple code that should do the trick:
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if '.txt' in file:
f = open(subdir+'/'+file, 'r')
line = f.readline()
while line:
line2 = line.split()
if line2:
output_file.write(" ".join(line2)+'\n')
line = f.readline()
f.close()
But instead, I get the following error:
File "/usr/lib/python3.1/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfe in position 0: unexpected code byte
It turns out these .txt files are all in UTF-16 (according to FireFox, at any rate). I thought Python 3.x was supposed to be able to handle any sort of character encoding??
Best,
Georgina
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用
open(bla, 'r',encoding="utf-16")
。Use
open(bla, 'r', encoding="utf-16")
.有多种 utf-16 编码。
utf-16-be 大端无 BOM
utf-16-le 小端无 BOM
utf-16 小端 + BOM
示例:
您可以按照 @filmor 的回答
There are various utf-16 encodings.
utf-16-be big endian no BOM
utf-16-le little endian no BOM
utf-16 little endian + BOM
Examples:
You can use these encodings as suggested by @filmor's answer