Python 中是否有支持将长字符串转储为块文字或折叠块的 yaml 库？

发布于 2024-11-16 22:10:27 字数 200 浏览 4 评论 0原文

我希望能够转储包含长字符串的字典，我希望将其采用块样式以提高可读性。例如：

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

PyYAML 支持使用这种样式加载文档，但我似乎找不到以这种方式转储文档的方法。我错过了什么吗？

原文

I'd like to be able to dump a dictionary containing long strings that I'd like to have in the block style for readability. For example:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

PyYAML supports the loading of documents with this style but I can't seem to find a way to dump documents this way. Am I missing something?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

她说她爱他 2024-11-23 22:10:27

import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
    'literal':literal_unicode(
        u'by hjw              ___\n'
         '   __              /.-.\\\n'
         '  /  )_____________\\\\  Y\n'
         ' /_ /=== == === === =\\ _\\_\n'
         '( /)=== == === === == Y   \\\n'
         ' `-------------------(  o  )\n'
         '                      \\___/\n'),
    'folded': folded_unicode(
        u'It removes all ordinary curses from all equipped items. '
        'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)

结果：

folded: >
  It removes all ordinary curses from all equipped items. Heavy or permanent curses
  are unaffected.
literal: |
  by hjw              ___
     __              /.-.\
    /  )_____________\\  Y
   /_ /=== == === === =\ _\_
  ( /)=== == === === == Y   \
   `-------------------(  o  )
                        \___/

为了完整性，还应该有 str 实现，但我会偷懒:-)

import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
    'literal':literal_unicode(
        u'by hjw              ___\n'
         '   __              /.-.\\\n'
         '  /  )_____________\\\\  Y\n'
         ' /_ /=== == === === =\\ _\\_\n'
         '( /)=== == === === == Y   \\\n'
         ' `-------------------(  o  )\n'
         '                      \\___/\n'),
    'folded': folded_unicode(
        u'It removes all ordinary curses from all equipped items. '
        'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)

The result:

folded: >
  It removes all ordinary curses from all equipped items. Heavy or permanent curses
  are unaffected.
literal: |
  by hjw              ___
     __              /.-.\
    /  )_____________\\  Y
   /_ /=== == === === =\ _\_
  ( /)=== == === === == Y   \
   `-------------------(  o  )
                        \___/

For completeness, one should also have str implementations, but I'm going to be lazy :-)

回复收藏 0 原文

七度光 2024-11-23 22:10:27

pyyaml 确实支持转储文字或折叠块。

使用 `Representer.add_representer`

定义类型：

class folded_str(str): pass

class literal_str(str): pass

class folded_unicode(unicode): pass

class literal_unicode(str): pass

然后您可以定义这些类型的表示者。
请注意，虽然 Gary 的解决方案非常适合 unicode，您可能需要更多工作才能使字符串正常工作（请参阅represent_str 的实现）。

def change_style(style, representer):
    def new_representer(dumper, data):
        scalar = representer(dumper, data)
        scalar.style = style
        return scalar
    return new_representer

import yaml
from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_folded_str = change_style('>', SafeRepresenter.represent_str)
represent_literal_str = change_style('|', SafeRepresenter.represent_str)
represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)

然后，您可以将这些表示符添加到默认转储器中：

yaml.add_representer(folded_str, represent_folded_str)
yaml.add_representer(literal_str, represent_literal_str)
yaml.add_representer(folded_unicode, represent_folded_unicode)
yaml.add_representer(literal_unicode, represent_literal_unicode)

... 并测试它：

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
}

print yaml.dump(data)

结果：

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal

使用 `default_style`

如果您有兴趣让所有字符串都遵循默认样式，您还可以使用 default_style 关键字参数，例如：

>>> data = { 'foo': 'line1\nline2\nline3' }
>>> print yaml.dump(data, default_style='|')
"foo": |-
  line1
  line2
  line3

或对于折叠文字：

>>> print yaml.dump(data, default_style='>')
"foo": >-
  line1

  line2

  line3

或对于双引号文字：

>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\nline3"

警告：

这是您可能意想不到的示例：

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
    'non-printable': literal_unicode('this has a \t tab in it'),
    'leading': literal_unicode('   with leading white spaces'),
    'trailing': literal_unicode('with trailing white spaces  '),
}
print yaml.dump(data)

结果：

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
leading: |2-
     with leading white spaces
non-printable: "this has a \t tab in it"
trailing: "with trailing white spaces  "

1) 不可打印字符

请参阅 YAML 规范转义字符 (第 5.7 节）：

请注意，转义序列仅在双引号标量中解释。在所有其他标量样式中，“\”字符没有特殊含义，并且不可打印的字符不可用。

如果要保留不可打印的字符（例如 TAB），则需要使用双引号标量。如果您能够使用文字样式转储标量，并且其中存在不可打印字符（例如 TAB），则您的 YAML 转储程序不合规。

例如 pyyaml 检测不可打印字符 \t 并使用双引号样式，即使指定了默认样式：

>>> data = { 'foo': 'line1\nline2\n\tline3' }
>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='>')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='|')
"foo": "line1\nline2\n\tline3"

2) 前导和尾随空格

另一点有用规范中的信息是：

内容中排除所有前导和尾随空白字符

这意味着，如果您的字符串确实具有前导或尾随空格，则除了双引号之外，这些字符不会以标量样式保留。因此，pyyaml 尝试检测标量中的内容，并可能强制使用双引号样式。

pyyaml does support dumping literal or folded blocks.

Using `Representer.add_representer`

defining types:

class folded_str(str): pass

class literal_str(str): pass

class folded_unicode(unicode): pass

class literal_unicode(str): pass

Then you can define the representers for those types.
Please note that while Gary's solution works great for unicode, you may need some more work to get strings to work right (see implementation of represent_str).

def change_style(style, representer):
    def new_representer(dumper, data):
        scalar = representer(dumper, data)
        scalar.style = style
        return scalar
    return new_representer

import yaml
from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_folded_str = change_style('>', SafeRepresenter.represent_str)
represent_literal_str = change_style('|', SafeRepresenter.represent_str)
represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)

Then you can add those representers to the default dumper:

yaml.add_representer(folded_str, represent_folded_str)
yaml.add_representer(literal_str, represent_literal_str)
yaml.add_representer(folded_unicode, represent_folded_unicode)
yaml.add_representer(literal_unicode, represent_literal_unicode)

... and test it:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
}

print yaml.dump(data)

result:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal

Using `default_style`

If you are interested in having all your strings follow a default style, you can also use the default_style keyword argument, e.g:

>>> data = { 'foo': 'line1\nline2\nline3' }
>>> print yaml.dump(data, default_style='|')
"foo": |-
  line1
  line2
  line3

or for folded literals:

>>> print yaml.dump(data, default_style='>')
"foo": >-
  line1

  line2

  line3

or for double-quoted literals:

>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\nline3"

Caveats:

Here is an example of something you may not expect:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
    'non-printable': literal_unicode('this has a \t tab in it'),
    'leading': literal_unicode('   with leading white spaces'),
    'trailing': literal_unicode('with trailing white spaces  '),
}
print yaml.dump(data)

results in:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
leading: |2-
     with leading white spaces
non-printable: "this has a \t tab in it"
trailing: "with trailing white spaces  "

1) non-printable characters

See the YAML spec for escaped characters (Section 5.7):

Note that escape sequences are only interpreted in double-quoted scalars. In all other scalar styles, the “\” character has no special meaning and non-printable characters are not available.

If you want to preserve non-printable characters (e.g. TAB), you need to use double-quoted scalars. If you are able to dump a scalar with literal style, and there is a non-printable character (e.g. TAB) in there, your YAML dumper is non-compliant.

E.g. pyyaml detects the non-printable character \t and uses the double-quoted style even though a default style is specified:

>>> data = { 'foo': 'line1\nline2\n\tline3' }
>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='>')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='|')
"foo": "line1\nline2\n\tline3"

2) leading and trailing white spaces

Another bit of useful information in the spec is:

All leading and trailing white space characters are excluded from the content

This means that if your string does have leading or trailing white space, these would not be preserved in scalar styles other than double-quoted. As a consequence, pyyaml tries to detect what is in your scalar and may force the double-quoted style.

回复收藏 0 原文

枫以 2024-11-23 22:10:27

这可以相对容易地完成，唯一的“障碍”是如何
指示字符串中的哪些空格需要被替换
表示为折叠标量，需要成为折叠。字面标量
有包含该信息的显式换行符，但这不能
用于折叠标量，因为它们可以包含显式换行符，例如
如果有前导空格，并且末尾还需要换行
为了不使用剥离 chomping 指示符 (>-) 表示

import sys
import ruamel.yaml

folded = ruamel.yaml.scalarstring.FoldedScalarString
literal = ruamel.yaml.scalarstring.LiteralScalarString

yaml = ruamel.yaml.YAML()

data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=folded('this is a folded block\n'),
)

data['bar'].fold_pos = [data['bar'].index(' folded')]

yaml.dump(data, sys.stdout)

，该指示符给出：

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

fold_pos 属性需要一个可逆的可迭代，表示位置
空格指示折叠位置。

如果你的字符串中从来没有管道字符（'|'），那么你
可以做类似的事情：

import re

s = 'this is a|folded block\n'
sf = folded(s.replace('|', ' '))  # need to have a space!
sf.fold_pos = [x.start() for x in re.finditer('\|', s)]  # | is special in re, needs escaping


data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=sf,  # need to have a space
)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

这也给出了您期望的输出

This can be relatively easily done, the only "hurdle" being how to
indicate which of the spaces in the string, that needs to be
represented as a folded scalar, needs to become a fold. The literal scalar
has explicit newlines containing that information, but this cannot
be used for folded scalars, as they can contain explicit newlines e.g. in
case there is leading whitespace and also needs a newline at the end
in order not to be represented with a stripping chomping indicator (>-)

import sys
import ruamel.yaml

folded = ruamel.yaml.scalarstring.FoldedScalarString
literal = ruamel.yaml.scalarstring.LiteralScalarString

yaml = ruamel.yaml.YAML()

data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=folded('this is a folded block\n'),
)

data['bar'].fold_pos = [data['bar'].index(' folded')]

yaml.dump(data, sys.stdout)

which gives:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

The fold_pos attribute expects a reversable iterable, representing positions
of spaces indicating where to fold.

If you never have pipe characters ('|') in your strings you
could have done something like:

import re

s = 'this is a|folded block\n'
sf = folded(s.replace('|', ' '))  # need to have a space!
sf.fold_pos = [x.start() for x in re.finditer('\|', s)]  # | is special in re, needs escaping


data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=sf,  # need to have a space
)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

which also gives exactly the output you expect

回复收藏 0 原文

~没有更多了~

关于作者

黑凤梨

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

Python 中是否有支持将长字符串转储为块文字或折叠块的 yaml 库？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

使用 `Representer.add_representer`

使用 `default_style`

警告：

1) 不可打印字符

2) 前导和尾随空格

Using `Representer.add_representer`

Using `default_style`

Caveats:

1) non-printable characters

2) leading and trailing white spaces

关于作者

相关话题

热门标签

推荐作者

隔纱相望

昵称有卵用

梨涡

蓝咒

白芷

樱娆

友情链接

Python 中是否有支持将长字符串转储为块文字或折叠块的 yaml 库？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

使用 Representer.add_representer

使用 default_style

警告：

1) 不可打印字符

2) 前导和尾随空格

Using Representer.add_representer

Using default_style

Caveats:

1) non-printable characters

2) leading and trailing white spaces

关于作者

相关话题

热门标签

推荐作者

隔纱相望

昵称有卵用

梨涡

蓝咒

白芷

樱娆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

使用 `Representer.add_representer`

使用 `default_style`

Using `Representer.add_representer`

Using `default_style`