使用 PyYAML 将文档作为 yaml 中的原始字符串加载

发布于 2024-11-25 18:44:22 字数 1048 浏览 1 评论 0原文

我想像下面这样解析 yaml 文档：

meta-info-1: val1
meta-info-2: val2

---

Plain text/markdown content!
jhaha

如果我使用 PyYAML load_all 这个，我会得到以下内容

>>> list(yaml.load_all(open('index.yml')))
[{'meta-info-1': 'val1', 'meta-info-2': 'val2'}, 'Plain text/markdown content! jhaha']

我在这里想要实现的是 yaml 文件应该包含两个文档，第二个是应该被解释为单个字符串文档，更具体地说是具有 Markdown 格式的任何大型文本正文。我不希望它被解析为 YAML 语法。

在上面的示例中，PyYAML 将第二个文档作为单个字符串返回。但是，例如，如果第二个文档用 : 字符代替 !，我会收到语法错误。这是因为 PyYAML 正在解析该文档中的内容。

有没有办法告诉 PyYAML 第二个文档只是一个原始字符串而不是解析它？

编辑：那里有一些很好的答案。虽然使用引号或文字语法解决了上述问题，但我希望用户能够编写纯文本，而不会产生任何额外的麻烦。只需三个 -（或 .）即可写下一大段纯文本。其中也可能包含引号。所以，我想知道是否可以告诉 PyYAML 仅解析一个文档，并将第二个文档提供给我。

编辑2：适应agf的想法，而不是使用try/ except，因为第二个文档可能是有效的yaml语法，

config_content, body_content = open(filename).read().split('\n---')
config = yaml.loads(config_content)
body = yaml.loads(body_content)

谢谢agf。

原文

I want to parse yaml documents like the following:

meta-info-1: val1
meta-info-2: val2

---

Plain text/markdown content!
jhaha

If I load_all this with PyYAML, I get the following

>>> list(yaml.load_all(open('index.yml')))
[{'meta-info-1': 'val1', 'meta-info-2': 'val2'}, 'Plain text/markdown content! jhaha']

What I am trying to achieve here is that the yaml file should contain two documents, and the second one is supposed to be interpreted as a single string document, more specifically any large body of text with markdown formatting. I don't want it to be parsed as YAML syntax.

In the above example, PyYAML returns the second document as a single string. But if the second document has a : character in place of the ! for instance, I get a syntax error. This is because PyYAML is parsing the stuff in that document.

Is there a way I can tell PyYAML that the second document is a just a raw string and not to parse it?

Edit: A few excellent answers there. While using quotes or the literal syntax solves the said problem, I'd like the users to be able to write the plain text without any extra cruft. Just the three -'s (or .'s) and write away a large body of plain text. Which might also include quotes too. So, I'd like to know if I can tell PyYAML to parse only one document, and give the second to me raw.

Edit 2: Adapting agf's idea, instead of using a try/except as the second document could be valid yaml syntax,

config_content, body_content = open(filename).read().split('\n---')
config = yaml.loads(config_content)
body = yaml.loads(body_content)

Thanks agf.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

巴黎盛开的樱花 2024-12-02 18:44:22

您可以这样做

raw = open(filename).read()
docs = []
for raw_doc in raw.split('\n---'):
    try:
        docs.append(yaml.load(raw_doc))
    except SyntaxError:
        docs.append(raw_doc)

如果您无法控制原始文档的格式，

。从 PyYAML 文档中，

双引号是最强大的样式，也是唯一可以表达任何标量值的样式。双引号标量允许转义。使用转义序列 \x** 和 \u****，您可以表达任何 ASCII 或 Unicode 字符。

所以听起来如果没有双引号，就无法在解析中表示任意标量。

You can do

raw = open(filename).read()
docs = []
for raw_doc in raw.split('\n---'):
    try:
        docs.append(yaml.load(raw_doc))
    except SyntaxError:
        docs.append(raw_doc)

If you won't have control over the format of the original document.

From the PyYAML docs,

Double-quoted is the most powerful style and the only style that can express any scalar value. Double-quoted scalars allow escaping. Using escaping sequences \x** and \u****, you may express any ASCII or Unicode character.

So it sounds like there is no way to represent an arbitrary scalar in the parsing if it's not double quoted.

回复收藏 0 原文