UTF-8 在 Python 日志记录中,如何?

发布于 2024-08-07 11:43:50 字数 1106 浏览 2 评论 0原文

我正在尝试使用 Python 的日志记录包将 UTF-8 编码的字符串记录到文件中。作为一个玩具示例:

import logging

def logging_test():
    handler = logging.FileHandler("/home/ted/logfile.txt", "w",
                                  encoding = "UTF-8")
    formatter = logging.Formatter("%(message)s")
    handler.setFormatter(formatter)
    root_logger = logging.getLogger()
    root_logger.addHandler(handler)
    root_logger.setLevel(logging.INFO)

    # This is an o with a hat on it.
    byte_string = '\xc3\xb4'
    unicode_string = unicode("\xc3\xb4", "utf-8")

    print "printed unicode object: %s" % unicode_string

    # Explode
    root_logger.info(unicode_string)

if __name__ == "__main__":
    logging_test()

在调用logging.info()时会出现UnicodeDecodeError错误。

在较低级别,Python 的日志记录包使用 codecs 包打开日志文件,并传入“UTF-8”参数作为编码。这一切都很好,但它试图将字节字符串而不是 unicode 对象写入文件,这会导致爆炸。本质上,Python 正在这样做:

file_handler.write(unicode_string.encode("UTF-8"))

当它应该这样做时:

file_handler.write(unicode_string)

这是 Python 中的一个错误,还是我正在服用疯狂的药丸? FWIW,这是一个普通的 Python 2.6 安装。

I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example:

import logging

def logging_test():
    handler = logging.FileHandler("/home/ted/logfile.txt", "w",
                                  encoding = "UTF-8")
    formatter = logging.Formatter("%(message)s")
    handler.setFormatter(formatter)
    root_logger = logging.getLogger()
    root_logger.addHandler(handler)
    root_logger.setLevel(logging.INFO)

    # This is an o with a hat on it.
    byte_string = '\xc3\xb4'
    unicode_string = unicode("\xc3\xb4", "utf-8")

    print "printed unicode object: %s" % unicode_string

    # Explode
    root_logger.info(unicode_string)

if __name__ == "__main__":
    logging_test()

This explodes with UnicodeDecodeError on the logging.info() call.

At a lower level, Python's logging package is using the codecs package to open the log file, passing in the "UTF-8" argument as the encoding. That's all well and good, but it's trying to write byte strings to the file instead of unicode objects, which explodes. Essentially, Python is doing this:

file_handler.write(unicode_string.encode("UTF-8"))

When it should be doing this:

file_handler.write(unicode_string)

Is this a bug in Python, or am I taking crazy pills? FWIW, this is a stock Python 2.6 installation.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

他是夢罘是命 2024-08-14 11:43:50

具有如下代码:

raise Exception(u'щ')

原因:

  File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
    s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

发生这种情况是因为格式字符串是字节字符串,而某些格式字符串参数是带有非 ASCII 字符的 unicode 字符串:

>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)

使格式字符串 unicode 修复了问题:

>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'

因此,在日志记录配置中将所有格式字符串 unicode:

'formatters': {
    'simple': {
        'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
        'datefmt': '%Y-%m-%d %H:%M:%S',
    },
 ...

并修补默认的 logging 格式化程序以使用 unicode 格式字符串:

logging._defaultFormatter = logging.Formatter(u"%(message)s")

Having code like:

raise Exception(u'щ')

Caused:

  File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
    s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:

>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)

Making the format string unicode fixes the issue:

>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'

So, in your logging configuration make all format string unicode:

'formatters': {
    'simple': {
        'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
        'datefmt': '%Y-%m-%d %H:%M:%S',
    },
 ...

And patch the default logging formatter to use unicode format string:

logging._defaultFormatter = logging.Formatter(u"%(message)s")
扎心 2024-08-14 11:43:50

检查您是否拥有最新的 Python 2.6 - 自 2.6 发布以来,已发现并修复了一些 Unicode 错误。例如,在我的 Ubuntu Jaunty 系统上,我运行了复制并粘贴的脚本,仅从日志文件名中删除了“/home/ted/”前缀。结果(从终端窗口复制并粘贴):

vinay@eta-jaunty:~/projects/scratch$ python --version
Python 2.6.2
vinay@eta-jaunty:~/projects/scratch$ python utest.py 
printed unicode object: ô
vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt 
ô
vinay@eta-jaunty:~/projects/scratch$ 

在 Windows 盒子上:

C:\temp>python --version
Python 2.6.2

C:\temp>python utest.py
printed unicode object: ô

文件内容:

alt text

这也可以解释为什么 Lennart Regebro 也无法复制它。

Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):

vinay@eta-jaunty:~/projects/scratch$ python --version
Python 2.6.2
vinay@eta-jaunty:~/projects/scratch$ python utest.py 
printed unicode object: ô
vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt 
ô
vinay@eta-jaunty:~/projects/scratch$ 

On a Windows box:

C:\temp>python --version
Python 2.6.2

C:\temp>python utest.py
printed unicode object: ô

And the contents of the file:

alt text

This might also explain why Lennart Regebro couldn't reproduce it either.

橘和柠 2024-08-14 11:43:50

我有点晚了,但我刚刚看到这篇文章,它使我能够非常轻松地设置 utf-8 日志记录

这里是帖子的链接

或这里是代码:

root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)

I'm a little late, but I just came across this post that enabled me to set up logging in utf-8 very easily

Here the link to the post

or here the code:

root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)
梨涡少年 2024-08-14 11:43:50

我在 Python3 中运行 Django 时遇到了类似的问题:我的记录器在遇到一些元音变音 (äöüß) 时死掉,但其他方面都很好。我查看了很多结果,发现没有一个有效。我尝试了

import locale; 
if locale.getpreferredencoding().upper() != 'UTF-8': 
    locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') 

从上面的评论中得到的结果。
它不起作用。查看当前的语言环境给了我一些疯狂的 ANSI 的东西,结果它基本上意味着“ASCII”。这让我完全走错了方向。

将日志格式字符串更改为 Unicode 没有帮助。
在脚本开头设置魔术编码注释不会有帮助。
在发件人的消息(文本来自 HTTP 请求)上设置字符集没有帮助。

所做的工作是在 settings.py 中将文件处理程序上的编码设置为 UTF-8。因为我没有设置任何内容,所以默认值将变为 None。显然最终是 ASCII(或者我想考虑的是:ASS-KEY)

    'handlers': {
        'file': {
            'level': 'DEBUG',
            'class': 'logging.handlers.TimedRotatingFileHandler',
            'encoding': 'UTF-8', # <-- That was missing.
            ....
        },
    },

I had a similar problem running Django in Python3: My logger died upon encountering some Umlauts (äöüß) but was otherwise fine. I looked through a lot of results and found none working. I tried

import locale; 
if locale.getpreferredencoding().upper() != 'UTF-8': 
    locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') 

which I got from the comment above.
It did not work. Looking at the current locale gave me some crazy ANSI thing, which turned out to mean basically just "ASCII". That sent me into totally the wrong direction.

Changing the logging format-strings to Unicode would not help.
Setting a magic encoding comment at the beginning of the script would not help.
Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.

What DID work was setting the encoding on the file-handler to UTF-8 in settings.py. Because I had nothing set, the default would become None. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)

    'handlers': {
        'file': {
            'level': 'DEBUG',
            'class': 'logging.handlers.TimedRotatingFileHandler',
            'encoding': 'UTF-8', # <-- That was missing.
            ....
        },
    },
网名女生简单气质 2024-08-14 11:43:50

试试这个:

import logging

def logging_test():
    log = open("./logfile.txt", "w")
    handler = logging.StreamHandler(log)
    formatter = logging.Formatter("%(message)s")
    handler.setFormatter(formatter)
    root_logger = logging.getLogger()
    root_logger.addHandler(handler)
    root_logger.setLevel(logging.INFO)

    # This is an o with a hat on it.
    byte_string = '\xc3\xb4'
    unicode_string = unicode("\xc3\xb4", "utf-8")

    print "printed unicode object: %s" % unicode_string

    # Explode
    root_logger.info(unicode_string.encode("utf8", "replace"))


if __name__ == "__main__":
    logging_test()

对于它的价值,我期望必须使用 codecs.open 来打开带有 utf-8 编码的文件,但要么这是默认值,要么这里发生了其他事情,因为它的工作原理是这样的。

Try this:

import logging

def logging_test():
    log = open("./logfile.txt", "w")
    handler = logging.StreamHandler(log)
    formatter = logging.Formatter("%(message)s")
    handler.setFormatter(formatter)
    root_logger = logging.getLogger()
    root_logger.addHandler(handler)
    root_logger.setLevel(logging.INFO)

    # This is an o with a hat on it.
    byte_string = '\xc3\xb4'
    unicode_string = unicode("\xc3\xb4", "utf-8")

    print "printed unicode object: %s" % unicode_string

    # Explode
    root_logger.info(unicode_string.encode("utf8", "replace"))


if __name__ == "__main__":
    logging_test()

For what it's worth I was expecting to have to use codecs.open to open the file with utf-8 encoding but either that's the default or something else is going on here, since it works as is like this.

转身以后 2024-08-14 11:43:50

如果我正确理解你的问题,当你这样做时,你的系统上应该会出现同样的问题:

str(u'ô')

我猜在 Unix 上自动编码到区域设置编码将无法工作,直到你启用了区域设置感知 if 分支sitesetencoding 函数a> 模块通过 locale。该文件通常位于 /usr/lib/python2.x 中,无论如何它都值得检查。 AFAIK,默认情况下禁用区域设置感知 setencoding (对于我的 Python 2.6 安装来说也是如此)。

选择是:

  • 让系统找出将 Unicode 字符串编码为字节的正确方法,或者在您的代码中执行此操作(需要在特定于站点的 site.py 中进行一些配置)
  • 在代码中对 Unicode 字符串进行编码并仅输出字节

另请参阅 Ian Bicking 的 The Illusive setdefaultencoding 和相关链接。

If I understood your problem correctly, the same issue should arise on your system when you do just:

str(u'ô')

I guess automatic encoding to the locale encoding on Unix will not work until you have enabled locale-aware if branch in the setencoding function in your site module via locale. This file usually resides in /usr/lib/python2.x, it worth inspecting anyway. AFAIK, locale-aware setencoding is disabled by default (it's true for my Python 2.6 installation).

The choices are:

  • Let the system figure out the right way to encode Unicode strings to bytes or do it in your code (some configuration in site-specific site.py is needed)
  • Encode Unicode strings in your code and output just bytes

See also The Illusive setdefaultencoding by Ian Bicking and related links.

旧时浪漫 2024-08-14 11:43:50

如果您使用 python 3.7 或更高版本,在运行 python 脚本之前,请将环境变量 PYTHONUTF8 设置为 1

例如,如果您使用 linux:

export PYTHONUTF8=1

Powershell:

$env:PYTHONUTF8 = "1"

Windows 命令行:

set PYTHONUTF8=1

然后执行您的 python 脚本。

If you use python 3.7 or later, before running your python script, set the environment variable PYTHONUTF8 to 1

For example, if you use linux:

export PYTHONUTF8=1

Powershell:

$env:PYTHONUTF8 = "1"

Windows command Line:

set PYTHONUTF8=1

Then execute your python script.

花开浅夏 2024-08-14 11:43:50

在 Python 3.10 中,我通过添加 encoding='utf-8' 设法记录 Unicode 字符(在我的例子中是希腊字母)。

小例子:

import logging
import sys

if __name__ == "__main__":
    logging.basicConfig(filename="log.log", filemode="w", level=logging.DEBUG, encoding="utf-8")
    root = logging.getLogger()
    root.setLevel(logging.DEBUG)
    handler = logging.StreamHandler(sys.stdout)
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter(" %(levelname)s - %(message)s")  # %(asctime)s - %(name)s -
    handler.setFormatter(formatter)
    root.addHandler(handler)
    logging.debug("Γεια σου μαρία")

In Python 3.10, I managed to log Unicode characters (Greek letters in my case) by adding encoding='utf-8'.

Small example:

import logging
import sys

if __name__ == "__main__":
    logging.basicConfig(filename="log.log", filemode="w", level=logging.DEBUG, encoding="utf-8")
    root = logging.getLogger()
    root.setLevel(logging.DEBUG)
    handler = logging.StreamHandler(sys.stdout)
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter(" %(levelname)s - %(message)s")  # %(asctime)s - %(name)s -
    handler.setFormatter(formatter)
    root.addHandler(handler)
    logging.debug("Γεια σου μαρία")
梦里兽 2024-08-14 11:43:50

Python 3.11.8,这对我有用。
https://gist.github.com/jtatum/5311955

import logging

# Add a file handler with utf-8 encoding
handler = logging.FileHandler('output.log', 'w',
                              encoding = 'utf-8')
root_logger = logging.getLogger()
root_logger.addHandler(handler)

Python 3.11.8, this works for me.
https://gist.github.com/jtatum/5311955

import logging

# Add a file handler with utf-8 encoding
handler = logging.FileHandler('output.log', 'w',
                              encoding = 'utf-8')
root_logger = logging.getLogger()
root_logger.addHandler(handler)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文