Python:从标准输入读取时出现 UnicodeEncodeError
当运行从 stdin 读取的 Python 程序时,出现以下错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 320: ordinal not in range(128)
如何修复它?
注意:错误发生在antlr内部,该行看起来像这样:
self.strdata = unicode(data)
因为我不想修改源代码, 我想传递一些可以接受的东西。
输入代码如下所示:
#!/usr/bin/python
import sys
import codecs
import antlr3
import antlr3.tree
from LatexLexer import LatexLexer
from LatexParser import LatexParser
char_stream = antlr3.ANTLRInputStream(codecs.getreader("utf8")(sys.stdin))
lexer = LatexLexer(char_stream)
tokens = antlr3.CommonTokenStream(lexer)
parser = LatexParser(tokens)
r = parser.document()
When running a Python program that reads from stdin, I get the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 320: ordinal not in range(128)
How can I fix it?
Note: The error occurs internal to antlr and the line looks like that:
self.strdata = unicode(data)
Since I don't want to modify the source code,
I'd like to pass in something that is acceptable.
The input code looks like that:
#!/usr/bin/python
import sys
import codecs
import antlr3
import antlr3.tree
from LatexLexer import LatexLexer
from LatexParser import LatexParser
char_stream = antlr3.ANTLRInputStream(codecs.getreader("utf8")(sys.stdin))
lexer = LatexLexer(char_stream)
tokens = antlr3.CommonTokenStream(lexer)
parser = LatexParser(tokens)
r = parser.document()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
问题是,当从 stdin 读取时,python 解码
它使用系统默认编码:
输入很可能是UTF-8或Windows-CP-1252,因此程序
对非 ASCII 字符感到窒息。
为了使用正确的解码器将 sys.stdin 转换为流,我使用了:
这解决了问题。
顺便说一句,这是 ANTLRs FileStream 用于打开文件的方法
使用给定的文件名(而不是给定的流):
顺便说一句#2:对于我发现
有用的字符串。
The problem is, that when reading from stdin, python decodes
it using the system default encoding:
The input is very likely UTF-8 or Windows-CP-1252, so the program
chokes on non-ASCII-characters.
To convert sys.stdin to a stream with the proper decoder, I used:
That fixed the problem.
BTW, this is the method ANTLRs FileStream uses to open a file
with given filename (instead of a given stream):
BTW #2: For strings I found
useful.
您在输入时不会收到此错误,但在尝试输出读取数据时会收到此错误。您应该对读取的数据进行解码,并扔掉 unicode,而不是一直处理字节串。
You're not getting this error on input, you're getting this error when trying to output the read data. You should be decoding data you read, and throwing the unicodes around instead of dealing with bytestrings the whole time.
以下是关于 Python 如何处理编码的精彩记录:
如何在 Python 中使用 UTF-8
Here is an excellent writedown about how Python handles encodings:
How to use UTF-8 with Python