Python 中是否有支持将长字符串转储为块文字或折叠块的 yaml 库?

发布于 2024-11-16 22:10:27 字数 200 浏览 4 评论 0原文

我希望能够转储包含长字符串的字典,我希望将其采用块样式以提高可读性。例如:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

PyYAML 支持使用这种样式加载文档,但我似乎找不到以这种方式转储文档的方法。我错过了什么吗?

I'd like to be able to dump a dictionary containing long strings that I'd like to have in the block style for readability. For example:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

PyYAML supports the loading of documents with this style but I can't seem to find a way to dump documents this way. Am I missing something?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

她说她爱他 2024-11-23 22:10:27
import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
    'literal':literal_unicode(
        u'by hjw              ___\n'
         '   __              /.-.\\\n'
         '  /  )_____________\\\\  Y\n'
         ' /_ /=== == === === =\\ _\\_\n'
         '( /)=== == === === == Y   \\\n'
         ' `-------------------(  o  )\n'
         '                      \\___/\n'),
    'folded': folded_unicode(
        u'It removes all ordinary curses from all equipped items. '
        'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)

结果:

folded: >
  It removes all ordinary curses from all equipped items. Heavy or permanent curses
  are unaffected.
literal: |
  by hjw              ___
     __              /.-.\
    /  )_____________\\  Y
   /_ /=== == === === =\ _\_
  ( /)=== == === === == Y   \
   `-------------------(  o  )
                        \___/

为了完整性,还应该有 str 实现,但我会偷懒:-)

import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
    'literal':literal_unicode(
        u'by hjw              ___\n'
         '   __              /.-.\\\n'
         '  /  )_____________\\\\  Y\n'
         ' /_ /=== == === === =\\ _\\_\n'
         '( /)=== == === === == Y   \\\n'
         ' `-------------------(  o  )\n'
         '                      \\___/\n'),
    'folded': folded_unicode(
        u'It removes all ordinary curses from all equipped items. '
        'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)

The result:

folded: >
  It removes all ordinary curses from all equipped items. Heavy or permanent curses
  are unaffected.
literal: |
  by hjw              ___
     __              /.-.\
    /  )_____________\\  Y
   /_ /=== == === === =\ _\_
  ( /)=== == === === == Y   \
   `-------------------(  o  )
                        \___/

For completeness, one should also have str implementations, but I'm going to be lazy :-)

七度光 2024-11-23 22:10:27

pyyaml 确实支持转储文字或折叠块。

使用 Representer.add_representer

定义类型:

class folded_str(str): pass

class literal_str(str): pass

class folded_unicode(unicode): pass

class literal_unicode(str): pass

然后您可以定义这些类型的表示者。
请注意,虽然 Gary解决方案非常适合 unicode,您可能需要更多工作才能使字符串正常工作(请参阅represent_str 的实现)。

def change_style(style, representer):
    def new_representer(dumper, data):
        scalar = representer(dumper, data)
        scalar.style = style
        return scalar
    return new_representer

import yaml
from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_folded_str = change_style('>', SafeRepresenter.represent_str)
represent_literal_str = change_style('|', SafeRepresenter.represent_str)
represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)

然后,您可以将这些表示符添加到默认转储器中:

yaml.add_representer(folded_str, represent_folded_str)
yaml.add_representer(literal_str, represent_literal_str)
yaml.add_representer(folded_unicode, represent_folded_unicode)
yaml.add_representer(literal_unicode, represent_literal_unicode)

... 并测试它:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
}

print yaml.dump(data)

结果:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal

使用 default_style

如果您有兴趣让所有字符串都遵循默认样式,您还可以使用 default_style 关键字参数,例如:

>>> data = { 'foo': 'line1\nline2\nline3' }
>>> print yaml.dump(data, default_style='|')
"foo": |-
  line1
  line2
  line3

或 对于折叠文字:

>>> print yaml.dump(data, default_style='>')
"foo": >-
  line1

  line2

  line3

或对于双引号文字:

>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\nline3"

警告:

这是您可能意想不到的示例:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
    'non-printable': literal_unicode('this has a \t tab in it'),
    'leading': literal_unicode('   with leading white spaces'),
    'trailing': literal_unicode('with trailing white spaces  '),
}
print yaml.dump(data)

结果:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
leading: |2-
     with leading white spaces
non-printable: "this has a \t tab in it"
trailing: "with trailing white spaces  "

1) 不可打印字符

请参阅 YAML 规范转义字符 (第 5.7 节):

请注意,转义序列仅在双引号标量中解释。在所有其他标量样式中,“\”字符没有特殊含义,并且不可打印的字符不可用。

如果要保留不可打印的字符(例如 TAB),则需要使用双引号标量。如果您能够使用文字样式转储标量,并且其中存在不可打印字符(例如 TAB),则您的 YAML 转储程序不合规。

例如 pyyaml 检测不可打印字符 \t 并使用双引号样式,即使指定了默认样式:

>>> data = { 'foo': 'line1\nline2\n\tline3' }
>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='>')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='|')
"foo": "line1\nline2\n\tline3"

2) 前导和尾随空格

另一点有用规范中的信息是:

内容中排除所有前导和尾随空白字符

这意味着,如果您的字符串确实具有前导或尾随空格,则除了双引号之外,这些字符不会以标量样式保留。因此,pyyaml 尝试检测标量中的内容,并可能强制使用双引号样式。

pyyaml does support dumping literal or folded blocks.

Using Representer.add_representer

defining types:

class folded_str(str): pass

class literal_str(str): pass

class folded_unicode(unicode): pass

class literal_unicode(str): pass

Then you can define the representers for those types.
Please note that while Gary's solution works great for unicode, you may need some more work to get strings to work right (see implementation of represent_str).

def change_style(style, representer):
    def new_representer(dumper, data):
        scalar = representer(dumper, data)
        scalar.style = style
        return scalar
    return new_representer

import yaml
from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_folded_str = change_style('>', SafeRepresenter.represent_str)
represent_literal_str = change_style('|', SafeRepresenter.represent_str)
represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)

Then you can add those representers to the default dumper:

yaml.add_representer(folded_str, represent_folded_str)
yaml.add_representer(literal_str, represent_literal_str)
yaml.add_representer(folded_unicode, represent_folded_unicode)
yaml.add_representer(literal_unicode, represent_literal_unicode)

... and test it:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
}

print yaml.dump(data)

result:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal

Using default_style

If you are interested in having all your strings follow a default style, you can also use the default_style keyword argument, e.g:

>>> data = { 'foo': 'line1\nline2\nline3' }
>>> print yaml.dump(data, default_style='|')
"foo": |-
  line1
  line2
  line3

or for folded literals:

>>> print yaml.dump(data, default_style='>')
"foo": >-
  line1

  line2

  line3

or for double-quoted literals:

>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\nline3"

Caveats:

Here is an example of something you may not expect:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
    'non-printable': literal_unicode('this has a \t tab in it'),
    'leading': literal_unicode('   with leading white spaces'),
    'trailing': literal_unicode('with trailing white spaces  '),
}
print yaml.dump(data)

results in:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
leading: |2-
     with leading white spaces
non-printable: "this has a \t tab in it"
trailing: "with trailing white spaces  "

1) non-printable characters

See the YAML spec for escaped characters (Section 5.7):

Note that escape sequences are only interpreted in double-quoted scalars. In all other scalar styles, the “\” character has no special meaning and non-printable characters are not available.

If you want to preserve non-printable characters (e.g. TAB), you need to use double-quoted scalars. If you are able to dump a scalar with literal style, and there is a non-printable character (e.g. TAB) in there, your YAML dumper is non-compliant.

E.g. pyyaml detects the non-printable character \t and uses the double-quoted style even though a default style is specified:

>>> data = { 'foo': 'line1\nline2\n\tline3' }
>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='>')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='|')
"foo": "line1\nline2\n\tline3"

2) leading and trailing white spaces

Another bit of useful information in the spec is:

All leading and trailing white space characters are excluded from the content

This means that if your string does have leading or trailing white space, these would not be preserved in scalar styles other than double-quoted. As a consequence, pyyaml tries to detect what is in your scalar and may force the double-quoted style.

枫以 2024-11-23 22:10:27

这可以相对容易地完成,唯一的“障碍”是如何
指示字符串中的哪些空格需要被替换
表示为折叠标量,需要成为折叠。字面标量
有包含该信息的显式换行符,但这不能
用于折叠标量,因为它们可以包含显式换行符,例如
如果有前导空格,并且末尾还需要换行
为了不使用剥离 chomping 指示符 (>-) 表示

import sys
import ruamel.yaml

folded = ruamel.yaml.scalarstring.FoldedScalarString
literal = ruamel.yaml.scalarstring.LiteralScalarString

yaml = ruamel.yaml.YAML()

data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=folded('this is a folded block\n'),
)

data['bar'].fold_pos = [data['bar'].index(' folded')]

yaml.dump(data, sys.stdout)

,该指示符给出:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

fold_pos 属性需要一个可逆的可迭代,表示位置
空格
指示折叠位置。

如果你的字符串中从来没有管道字符('|'),那么你
可以做类似的事情:

import re

s = 'this is a|folded block\n'
sf = folded(s.replace('|', ' '))  # need to have a space!
sf.fold_pos = [x.start() for x in re.finditer('\|', s)]  # | is special in re, needs escaping


data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=sf,  # need to have a space
)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

这也给出了您期望的输出

This can be relatively easily done, the only "hurdle" being how to
indicate which of the spaces in the string, that needs to be
represented as a folded scalar, needs to become a fold. The literal scalar
has explicit newlines containing that information, but this cannot
be used for folded scalars, as they can contain explicit newlines e.g. in
case there is leading whitespace and also needs a newline at the end
in order not to be represented with a stripping chomping indicator (>-)

import sys
import ruamel.yaml

folded = ruamel.yaml.scalarstring.FoldedScalarString
literal = ruamel.yaml.scalarstring.LiteralScalarString

yaml = ruamel.yaml.YAML()

data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=folded('this is a folded block\n'),
)

data['bar'].fold_pos = [data['bar'].index(' folded')]

yaml.dump(data, sys.stdout)

which gives:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

The fold_pos attribute expects a reversable iterable, representing positions
of spaces
indicating where to fold.

If you never have pipe characters ('|') in your strings you
could have done something like:

import re

s = 'this is a|folded block\n'
sf = folded(s.replace('|', ' '))  # need to have a space!
sf.fold_pos = [x.start() for x in re.finditer('\|', s)]  # | is special in re, needs escaping


data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=sf,  # need to have a space
)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

which also gives exactly the output you expect

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文