如何强制 PyYAML 将字符串加载为 unicode 对象?

发布于 2024-09-02 08:54:29 字数 518 浏览 21 评论 0原文

PyYAML 包将未标记的字符串加载为 unicode 或 str 对象,具体取决于其内容。

我想在整个程序中使用 unicode 对象(不幸的是,目前还无法切换到 Python 3)。

有没有一种简单的方法可以强制 PyYAML 始终以字符串加载 unicode 对象?我不想用 !!python/unicode 标签弄乱我的 YAML。

# Encoding: UTF-8

import yaml

menu= u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
"""

print yaml.load(menu)

输出:['spam', 'eggs', 'bacon', u'cr\xe8me br\xfbl\xe9e', 'spam']

我想要:[u'spam' , u'鸡蛋', u'培根', u'cr\xe8me br\xfbl\xe9e', u'垃圾邮件']

The PyYAML package loads unmarked strings as either unicode or str objects, depending on their content.

I would like to use unicode objects throughout my program (and, unfortunately, can't switch to Python 3 just yet).

Is there an easy way to force PyYAML to always strings load unicode objects? I do not want to clutter my YAML with !!python/unicode tags.

# Encoding: UTF-8

import yaml

menu= u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
"""

print yaml.load(menu)

Output: ['spam', 'eggs', 'bacon', u'cr\xe8me br\xfbl\xe9e', 'spam']

I would like: [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

毅然前行 2024-09-09 08:54:29

这是一个通过始终输出 unicode 来覆盖 PyYAML 对字符串的处理的版本。实际上,这可能与我发布的其他响应的结果相同,只是更短(即您仍然需要确保自定义类中的字符串转换为 unicode 或传递 unicode 如果您使用自定义处理程序,请自行字符串):(

# -*- coding: utf-8 -*-
import yaml
from yaml import Loader, SafeLoader

def construct_yaml_str(self, node):
    # Override the default string handling function 
    # to always return unicode objects
    return self.construct_scalar(node)
Loader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)
SafeLoader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)

print yaml.load(u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
""")

上面给出了 [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']

我还没有在 LibYAML(基于 C 的解析器)上测试它,因为我无法编译它,所以我将保留其他答案。

Here's a version which overrides the PyYAML handling of strings by always outputting unicode. In reality, this is probably the identical result of the other response I posted except shorter (i.e. you still need to make sure that strings in custom classes are converted to unicode or passed unicode strings yourself if you use custom handlers):

# -*- coding: utf-8 -*-
import yaml
from yaml import Loader, SafeLoader

def construct_yaml_str(self, node):
    # Override the default string handling function 
    # to always return unicode objects
    return self.construct_scalar(node)
Loader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)
SafeLoader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)

print yaml.load(u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
""")

(The above gives [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam'])

I haven't tested it on LibYAML (the c-based parser) as I couldn't compile it though, so I'll leave the other answer as it was.

小镇女孩 2024-09-09 08:54:29

您可以使用以下函数将 str 替换为 PyYAML 解码输出中的 unicode 类型:

def make_str_unicode(obj):
    t = type(obj)

    if t in (list, tuple):
        if t == tuple:
            # Convert to a list if a tuple to 
            # allow assigning to when copying
            is_tuple = True
            obj = list(obj)
        else: 
            # Otherwise just do a quick slice copy
            obj = obj[:]
            is_tuple = False

        # Copy each item recursively
        for x in xrange(len(obj)):
            obj[x] = make_str_unicode(obj[x])

        if is_tuple: 
            # Convert back into a tuple again
            obj = tuple(obj)

    elif t == dict: 
        for k in obj:
            if type(k) == str:
                # Make dict keys unicode
                k = unicode(k)
            obj[k] = make_str_unicode(obj[k])

    elif t == str:
        # Convert strings to unicode objects
        obj = unicode(obj)
    return obj

print make_str_unicode({'blah': ['the', 'quick', u'brown', 124]})

Here's a function you could use to use to replace str with unicode types from the decoded output of PyYAML:

def make_str_unicode(obj):
    t = type(obj)

    if t in (list, tuple):
        if t == tuple:
            # Convert to a list if a tuple to 
            # allow assigning to when copying
            is_tuple = True
            obj = list(obj)
        else: 
            # Otherwise just do a quick slice copy
            obj = obj[:]
            is_tuple = False

        # Copy each item recursively
        for x in xrange(len(obj)):
            obj[x] = make_str_unicode(obj[x])

        if is_tuple: 
            # Convert back into a tuple again
            obj = tuple(obj)

    elif t == dict: 
        for k in obj:
            if type(k) == str:
                # Make dict keys unicode
                k = unicode(k)
            obj[k] = make_str_unicode(obj[k])

    elif t == str:
        # Convert strings to unicode objects
        obj = unicode(obj)
    return obj

print make_str_unicode({'blah': ['the', 'quick', u'brown', 124]})
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文