UTF-8 在 Python 日志记录中,如何?
我正在尝试使用 Python 的日志记录包将 UTF-8 编码的字符串记录到文件中。作为一个玩具示例:
import logging
def logging_test():
handler = logging.FileHandler("/home/ted/logfile.txt", "w",
encoding = "UTF-8")
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string)
if __name__ == "__main__":
logging_test()
在调用logging.info()时会出现UnicodeDecodeError错误。
在较低级别,Python 的日志记录包使用 codecs 包打开日志文件,并传入“UTF-8”参数作为编码。这一切都很好,但它试图将字节字符串而不是 unicode 对象写入文件,这会导致爆炸。本质上,Python 正在这样做:
file_handler.write(unicode_string.encode("UTF-8"))
当它应该这样做时:
file_handler.write(unicode_string)
这是 Python 中的一个错误,还是我正在服用疯狂的药丸? FWIW,这是一个普通的 Python 2.6 安装。
I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example:
import logging
def logging_test():
handler = logging.FileHandler("/home/ted/logfile.txt", "w",
encoding = "UTF-8")
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string)
if __name__ == "__main__":
logging_test()
This explodes with UnicodeDecodeError on the logging.info() call.
At a lower level, Python's logging package is using the codecs package to open the log file, passing in the "UTF-8" argument as the encoding. That's all well and good, but it's trying to write byte strings to the file instead of unicode objects, which explodes. Essentially, Python is doing this:
file_handler.write(unicode_string.encode("UTF-8"))
When it should be doing this:
file_handler.write(unicode_string)
Is this a bug in Python, or am I taking crazy pills? FWIW, this is a stock Python 2.6 installation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
具有如下代码:
原因:
发生这种情况是因为格式字符串是字节字符串,而某些格式字符串参数是带有非 ASCII 字符的 unicode 字符串:
使格式字符串 unicode 修复了问题:
因此,在日志记录配置中将所有格式字符串 unicode:
并修补默认的
logging
格式化程序以使用 unicode 格式字符串:Having code like:
Caused:
This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:
Making the format string unicode fixes the issue:
So, in your logging configuration make all format string unicode:
And patch the default
logging
formatter to use unicode format string:检查您是否拥有最新的 Python 2.6 - 自 2.6 发布以来,已发现并修复了一些 Unicode 错误。例如,在我的 Ubuntu Jaunty 系统上,我运行了复制并粘贴的脚本,仅从日志文件名中删除了“/home/ted/”前缀。结果(从终端窗口复制并粘贴):
在 Windows 盒子上:
文件内容:
这也可以解释为什么 Lennart Regebro 也无法复制它。
Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):
On a Windows box:
And the contents of the file:
This might also explain why Lennart Regebro couldn't reproduce it either.
我有点晚了,但我刚刚看到这篇文章,它使我能够非常轻松地设置 utf-8 日志记录
这里是帖子的链接
或这里是代码:
I'm a little late, but I just came across this post that enabled me to set up logging in utf-8 very easily
Here the link to the post
or here the code:
我在 Python3 中运行 Django 时遇到了类似的问题:我的记录器在遇到一些元音变音 (äöüß) 时死掉,但其他方面都很好。我查看了很多结果,发现没有一个有效。我尝试了
从上面的评论中得到的结果。
它不起作用。查看当前的语言环境给了我一些疯狂的 ANSI 的东西,结果它基本上意味着“ASCII”。这让我完全走错了方向。
将日志格式字符串更改为 Unicode 没有帮助。
在脚本开头设置魔术编码注释不会有帮助。
在发件人的消息(文本来自 HTTP 请求)上设置字符集没有帮助。
所做的工作是在
settings.py
中将文件处理程序上的编码设置为 UTF-8。因为我没有设置任何内容,所以默认值将变为None
。显然最终是 ASCII(或者我想考虑的是:ASS-KEY)I had a similar problem running Django in Python3: My logger died upon encountering some Umlauts (äöüß) but was otherwise fine. I looked through a lot of results and found none working. I tried
which I got from the comment above.
It did not work. Looking at the current locale gave me some crazy ANSI thing, which turned out to mean basically just "ASCII". That sent me into totally the wrong direction.
Changing the logging format-strings to Unicode would not help.
Setting a magic encoding comment at the beginning of the script would not help.
Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.
What DID work was setting the encoding on the file-handler to UTF-8 in
settings.py
. Because I had nothing set, the default would becomeNone
. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)试试这个:
对于它的价值,我期望必须使用 codecs.open 来打开带有 utf-8 编码的文件,但要么这是默认值,要么这里发生了其他事情,因为它的工作原理是这样的。
Try this:
For what it's worth I was expecting to have to use codecs.open to open the file with utf-8 encoding but either that's the default or something else is going on here, since it works as is like this.
如果我正确理解你的问题,当你这样做时,你的系统上应该会出现同样的问题:
我猜在 Unix 上自动编码到区域设置编码将无法工作,直到你启用了区域设置感知
if
分支site
setencoding 函数a> 模块通过locale
。该文件通常位于/usr/lib/python2.x
中,无论如何它都值得检查。 AFAIK,默认情况下禁用区域设置感知setencoding
(对于我的 Python 2.6 安装来说也是如此)。选择是:
site.py
中进行一些配置)另请参阅 Ian Bicking 的 The Illusive setdefaultencoding 和相关链接。
If I understood your problem correctly, the same issue should arise on your system when you do just:
I guess automatic encoding to the locale encoding on Unix will not work until you have enabled locale-aware
if
branch in thesetencoding
function in yoursite
module vialocale
. This file usually resides in/usr/lib/python2.x
, it worth inspecting anyway. AFAIK, locale-awaresetencoding
is disabled by default (it's true for my Python 2.6 installation).The choices are:
site.py
is needed)See also The Illusive setdefaultencoding by Ian Bicking and related links.
如果您使用 python 3.7 或更高版本,在运行 python 脚本之前,请将环境变量 PYTHONUTF8 设置为 1
例如,如果您使用 linux:
Powershell:
Windows 命令行:
然后执行您的 python 脚本。
If you use python 3.7 or later, before running your python script, set the environment variable PYTHONUTF8 to 1
For example, if you use linux:
Powershell:
Windows command Line:
Then execute your python script.
在 Python 3.10 中,我通过添加
encoding='utf-8'
设法记录 Unicode 字符(在我的例子中是希腊字母)。小例子:
In Python 3.10, I managed to log Unicode characters (Greek letters in my case) by adding
encoding='utf-8'
.Small example:
Python 3.11.8,这对我有用。
https://gist.github.com/jtatum/5311955
Python 3.11.8, this works for me.
https://gist.github.com/jtatum/5311955