Python：从标准输入读取时出现 UnicodeEncodeError

发布于 2024-08-25 18:08:29 字数 704 浏览 8 评论 0原文

当运行从 stdin 读取的 Python 程序时，出现以下错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 320: ordinal not in range(128)

如何修复它？

注意：错误发生在antlr内部，该行看起来像这样：

        self.strdata = unicode(data)

因为我不想修改源代码，我想传递一些可以接受的东西。

输入代码如下所示：

#!/usr/bin/python
import sys
import codecs
import antlr3
import antlr3.tree
from LatexLexer import LatexLexer
from LatexParser import LatexParser


char_stream = antlr3.ANTLRInputStream(codecs.getreader("utf8")(sys.stdin))
lexer = LatexLexer(char_stream)
tokens = antlr3.CommonTokenStream(lexer)
parser = LatexParser(tokens)
r = parser.document()

原文

When running a Python program that reads from stdin, I get the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 320: ordinal not in range(128)

How can I fix it?

Note: The error occurs internal to antlr and the line looks like that:

        self.strdata = unicode(data)

Since I don't want to modify the source code,
I'd like to pass in something that is acceptable.

The input code looks like that:

#!/usr/bin/python
import sys
import codecs
import antlr3
import antlr3.tree
from LatexLexer import LatexLexer
from LatexParser import LatexParser


char_stream = antlr3.ANTLRInputStream(codecs.getreader("utf8")(sys.stdin))
lexer = LatexLexer(char_stream)
tokens = antlr3.CommonTokenStream(lexer)
parser = LatexParser(tokens)
r = parser.document()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

流年里的时光 2024-09-01 18:08:29

问题是，当从 stdin 读取时，python 解码
它使用系统默认编码：

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

输入很可能是UTF-8或Windows-CP-1252，因此程序
对非 ASCII 字符感到窒息。

为了使用正确的解码器将 sys.stdin 转换为流，我使用了：

import codecs
char_stream = codecs.getreader("utf-8")(sys.stdin)

这解决了问题。

顺便说一句，这是 ANTLRs FileStream 用于打开文件的方法
使用给定的文件名（而不是给定的流）：

    fp = codecs.open(fileName, 'rb', encoding)
    try:
        data = fp.read()
    finally:
        fp.close()

顺便说一句#2：对于我发现

a_string.encode(encoding)

有用的字符串。

The problem is, that when reading from stdin, python decodes
it using the system default encoding:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

The input is very likely UTF-8 or Windows-CP-1252, so the program
chokes on non-ASCII-characters.

To convert sys.stdin to a stream with the proper decoder, I used:

import codecs
char_stream = codecs.getreader("utf-8")(sys.stdin)

That fixed the problem.

BTW, this is the method ANTLRs FileStream uses to open a file
with given filename (instead of a given stream):

    fp = codecs.open(fileName, 'rb', encoding)
    try:
        data = fp.read()
    finally:
        fp.close()

BTW #2: For strings I found

a_string.encode(encoding)

useful.

回复收藏 0 原文

新雨望断虹 2024-09-01 18:08:29

您在输入时不会收到此错误，但在尝试输出读取数据时会收到此错误。您应该对读取的数据进行解码，并扔掉 unicode，而不是一直处理字节串。

回复收藏 0 原文

_蜘蛛 2024-09-01 18:08:29

以下是关于 Python 如何处理编码的精彩记录：

如何在 Python 中使用 UTF-8

回复收藏 0 原文

~没有更多了~

关于作者

不…忘初心

暂无简介

0 文章

0 评论

23 人气

关注发私信

linfzu01

文章 0 评论 0

关注

§对你不离不弃

文章 0 评论 0

关注

可遇━不可求

文章 0 评论 0

关注

枕梦

文章 0 评论 0

关注

qq_3LFa8Q

文章 0 评论 0

关注

JP

文章 0 评论 0

友情链接

文江博客

Python：从标准输入读取时出现 UnicodeEncodeError

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

linfzu01

§对你不离不弃

可遇━不可求

枕梦

qq_3LFa8Q

JP

友情链接

Python：从标准输入读取时出现 UnicodeEncodeError

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

linfzu01

§对你不离不弃

可遇━不可求

枕梦

qq_3LFa8Q

JP

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。