在 python 中验证 yaml 文档

发布于 2024-09-10 07:12:15 字数 77 浏览 11 评论 0原文

XML 的好处之一是能够根据 XSD 验证文档。 YAML 没有此功能,那么如何验证我打开的 YAML 文档是否符合我的应用程序所需的格式?

One of the benefits of XML is being able to validate a document against an XSD. YAML doesn't have this feature, so how can I validate that the YAML document I open is in the format expected by my application?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

顾冷 2024-09-17 07:12:15

鉴于 JSON 和 YAML 非常相似,您可以使用 JSON-Schema 来验证 YAML 的相当大的子集。这是一个代码片段(您需要 PyYAMLjsonschema 安装):

from jsonschema import validate
import yaml

schema = """
type: object
properties:
  testing:
    type: array
    items:
      enum:
        - this
        - is
        - a
        - test
"""

good_instance = """
testing: ['this', 'is', 'a', 'test']
"""

validate(yaml.load(good_instance), yaml.load(schema)) # passes

# Now let's try a bad instance...

bad_instance = """
testing: ['this', 'is', 'a', 'bad', 'test']
"""

validate(yaml.load(bad_instance), yaml.load(schema))

# Fails with:
# ValidationError: 'bad' is not one of ['this', 'is', 'a', 'test']
#
# Failed validating 'enum' in schema['properties']['testing']['items']:
#     {'enum': ['this', 'is', 'a', 'test']}
#
# On instance['testing'][3]:
#     'bad'

这样做的一个问题是,如果您的架构跨越多个文件并且您使用 "$ref" 引用其他文件,那么这些其他文件将需要是 JSON,我认为。但可能有办法解决这个问题。在我自己的项目中,我正在使用 JSON 文件指定架构,而实例是 YAML。

Given that JSON and YAML are pretty similar beasts, you could make use of JSON-Schema to validate a sizable subset of YAML. Here's a code snippet (you'll need PyYAML and jsonschema installed):

from jsonschema import validate
import yaml

schema = """
type: object
properties:
  testing:
    type: array
    items:
      enum:
        - this
        - is
        - a
        - test
"""

good_instance = """
testing: ['this', 'is', 'a', 'test']
"""

validate(yaml.load(good_instance), yaml.load(schema)) # passes

# Now let's try a bad instance...

bad_instance = """
testing: ['this', 'is', 'a', 'bad', 'test']
"""

validate(yaml.load(bad_instance), yaml.load(schema))

# Fails with:
# ValidationError: 'bad' is not one of ['this', 'is', 'a', 'test']
#
# Failed validating 'enum' in schema['properties']['testing']['items']:
#     {'enum': ['this', 'is', 'a', 'test']}
#
# On instance['testing'][3]:
#     'bad'

One problem with this is that if your schema spans multiple files and you use "$ref" to reference the other files then those other files will need to be JSON, I think. But there are probably ways around that. In my own project, I'm playing with specifying the schema using JSON files whilst the instances are YAML.

↙厌世 2024-09-17 07:12:15

我发现 Cerberus 非常可靠,拥有丰富的文档并且易于使用。

下面是一个基本的实现示例:

my_yaml.yaml

name: 'my_name'
date: 2017-10-01
metrics:
    percentage:
    value: 87
    trend: stable

schema.py 中定义验证架构:

{
    'name': {
        'required': True,
        'type': 'string'
    },
    'date': {
        'required': True,
        'type': 'date'
    },
    'metrics': {
        'required': True,
        'type': 'dict',
        'schema': {
            'percentage': {
                'required': True,
                'type': 'dict',
                'schema': {
                    'value': {
                        'required': True,
                        'type': 'number',
                        'min': 0,
                        'max': 100
                    },
                    'trend': {
                        'type': 'string',
                        'nullable': True,
                        'regex': '^(?i)(down|equal|up)

使用 PyYaml 加载 yaml 文档:

import yaml
def load_doc():
    with open('./my_yaml.yaml', 'r') as stream:
        try:
            return yaml.load(stream)
        except yaml.YAMLError as exception:
            raise exception

## Now, validating the yaml file is straightforward:
from cerberus import Validator
schema = eval(open('./schema.py', 'r').read())
    v = Validator(schema)
    doc = load_doc()
    print(v.validate(doc, schema))
    print(v.errors)

请记住,Cerberus 是一个不可知的数据验证工具,这意味着它可以支持 YAML 以外的格式,例如 JSON 、XML 等。

} } } } } }

使用 PyYaml 加载 yaml 文档:

请记住,Cerberus 是一个不可知的数据验证工具,这意味着它可以支持 YAML 以外的格式,例如 JSON 、XML 等。

I find Cerberus to be very reliable with great documentation and straightforward to use.

Here is a basic implementation example:

my_yaml.yaml:

name: 'my_name'
date: 2017-10-01
metrics:
    percentage:
    value: 87
    trend: stable

Defining the validation schema in schema.py:

{
    'name': {
        'required': True,
        'type': 'string'
    },
    'date': {
        'required': True,
        'type': 'date'
    },
    'metrics': {
        'required': True,
        'type': 'dict',
        'schema': {
            'percentage': {
                'required': True,
                'type': 'dict',
                'schema': {
                    'value': {
                        'required': True,
                        'type': 'number',
                        'min': 0,
                        'max': 100
                    },
                    'trend': {
                        'type': 'string',
                        'nullable': True,
                        'regex': '^(?i)(down|equal|up)

Using the PyYaml to load a yaml document:

import yaml
def load_doc():
    with open('./my_yaml.yaml', 'r') as stream:
        try:
            return yaml.load(stream)
        except yaml.YAMLError as exception:
            raise exception

## Now, validating the yaml file is straightforward:
from cerberus import Validator
schema = eval(open('./schema.py', 'r').read())
    v = Validator(schema)
    doc = load_doc()
    print(v.validate(doc, schema))
    print(v.errors)

Keep in mind that Cerberus is an agnostic data validation tool, which means that it can support formats other than YAML, such as JSON, XML and so on.

} } } } } }

Using the PyYaml to load a yaml document:

Keep in mind that Cerberus is an agnostic data validation tool, which means that it can support formats other than YAML, such as JSON, XML and so on.

灼疼热情 2024-09-17 07:12:15

您可以将 YAML 文档加载为 dict 并使用库 schema 来检查它:

from schema import Schema, And, Use, Optional, SchemaError
import yaml

schema = Schema(
        {
            'created': And(datetime.datetime),
            'author': And(str),
            'email': And(str),
            'description': And(str),
            Optional('tags'): And(str, lambda s: len(s) >= 0),
            'setup': And(list),
            'steps': And(list, lambda steps: all('=>' in s for s in steps), error='Steps should be array of string '
                                                                                  'and contain "=>" to separate'
                                                                                  'actions and expectations'),
            'teardown': And(list)
        }
    )

with open(filepath) as f:
   data = yaml.load(f)
   try:
       schema.validate(data)
   except SchemaError as e:
       print(e)

You can load YAML document as a dict and use library schema to check it:

from schema import Schema, And, Use, Optional, SchemaError
import yaml

schema = Schema(
        {
            'created': And(datetime.datetime),
            'author': And(str),
            'email': And(str),
            'description': And(str),
            Optional('tags'): And(str, lambda s: len(s) >= 0),
            'setup': And(list),
            'steps': And(list, lambda steps: all('=>' in s for s in steps), error='Steps should be array of string '
                                                                                  'and contain "=>" to separate'
                                                                                  'actions and expectations'),
            'teardown': And(list)
        }
    )

with open(filepath) as f:
   data = yaml.load(f)
   try:
       schema.validate(data)
   except SchemaError as e:
       print(e)
机场等船 2024-09-17 07:12:15

尚未提及 Pydantic

从他们的例子来看:

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel


class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: Optional[datetime] = None
    friends: List[int] = []


# Parse your YAML into a dictionary, then validate against your model.
external_data = {
    'id': '123',
    'signup_ts': '2019-06-01 12:22',
    'friends': [1, 2, '3'],
}
user = User(**external_data)

Pydantic has not been mentioned.

From their example:

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel


class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: Optional[datetime] = None
    friends: List[int] = []


# Parse your YAML into a dictionary, then validate against your model.
external_data = {
    'id': '123',
    'signup_ts': '2019-06-01 12:22',
    'friends': [1, 2, '3'],
}
user = User(**external_data)
我的黑色迷你裙 2024-09-17 07:12:15

尝试 Rx,它有一个 Python 实现。它适用于 JSON 和 YAML。

来自 Rx 站点:

“在向 Web 服务添加 API 时,您必须选择如何对通过线路发送的数据进行编码。XML 是一种常见的选择,但它很快就会变得晦涩难懂且繁琐。许多 Web 服务作者希望避免考虑 XML,而选择提供与现代编程语言中的常见数据结构相对应的一些简单数据类型的格式。

不幸的是,虽然这些格式可以轻松传递复杂的数据结构,但它们缺乏验证系统。 XML 具有 XML Schema 和 RELAX NG,但这些标准很复杂,有时甚至令人困惑。它们对于 JSON 提供的数据结构的移植性不太好,如果您想避免将 XML 作为数据编码,那么编写更多 XML 来验证第一个 XML 可能更没有吸引力。

Rx 旨在提供一个与 JSON 风格的数据结构相匹配的数据验证系统,并且与 JSON 本身一样易于使用。”

Try Rx, it has a Python implementation. It works on JSON and YAML.

From the Rx site:


"When adding an API to your web service, you have to choose how to encode the data you send across the line. XML is one common choice for this, but it can grow arcane and cumbersome pretty quickly. Lots of webservice authors want to avoid thinking about XML, and instead choose formats that provide a few simple data types that correspond to common data structures in modern programming languages. In other words, JSON and YAML.

Unfortunately, while these formats make it easy to pass around complex data structures, they lack a system for validation. XML has XML Schemas and RELAX NG, but these are complicated and sometimes confusing standards. They're not very portable to the kind of data structure provided by JSON, and if you wanted to avoid XML as a data encoding, writing more XML to validate the first XML is probably even less appealing.

Rx is meant to provide a system for data validation that matches up with JSON-style data structures and is as easy to work with as JSON itself."

幽梦紫曦~ 2024-09-17 07:12:15

是的 - 对验证的支持对于许多重要的用例至关重要。请参阅 YAML 以及模式验证 « Stuart Gunter

正如已经提到的,有 Rx,可用于各种语言,并且Kwalify 适用于 Ruby 和 Java。

另请参阅 PyYAML 讨论:YAMLSchemaDiscussion

相关的工作是 JSON Schema,它甚至有一些 IETF 标准化活动:draft-zyp-json-schema-03 - 用于描述 JSON 结构和含义的 JSON 媒体类型文件

Yes - having support for validation is vital for lots of important use cases. See e.g. YAML and the importance of Schema Validation « Stuart Gunter

As already mentioned, there is Rx, available for various languages, and Kwalify for Ruby and Java.

See also the PyYAML discussion: YAMLSchemaDiscussion.

A related effort is JSON Schema, which even had some IETF standardization activity: draft-zyp-json-schema-03 - A JSON Media Type for Describing the Structure and Meaning of JSON Documents

疑心病 2024-09-17 07:12:15

我参与了一个类似的项目,我需要验证 YAML 的元素。

首先,我认为“PyYAML 标签”是最好、最简单的方法。但后来决定使用“PyKwalify”,它实际上定义了 YAML 的架构。

PyYAML 标签:

YAML 文件具有标签支持,我们可以通过为数据类型添加前缀来强制执行此基本检查。 (例如)对于整数 - !!int "123"

有关 PyYAML 的更多信息: http://pyyaml.org/ wiki/PyYAMLDocumentation#Tags
这很好,但如果您要将其公开给最终用户,则可能会引起混乱。
我做了一些研究来定义 YAML 的架构。

  • 使用相应的架构验证 YAML 以进行基本数据类型检查。
  • 可以在架构中添加自定义验证,例如 IP 地址、随机字符串。
  • 单独使用 YAML 架构,使 YAML 数据简单易读。

PyKwalify:

有一个名为 PyKwalify 的包可以用于此目的: https://pypi.python.org/pypi /pykwalify

这个包最适合我的要求。
我在本地设置中尝试了一个小例子,并且正在工作。这是示例架构文件。

#sample schema

type: map
mapping:
    Emp:
        type:    map
        mapping:
            name:
                type:      str
                required:  yes
            email:
                type:      str
            age:
                type:      int
            birth:
                type:     str

此架构的有效 YAML 文件

---
Emp:
    name:   "abc"
    email:  "[email protected]"
    age:    yy
    birth:  "xx/xx/xxxx"
                                                                

谢谢

I worked on a similar project where I need to validate the elements of YAML.

First, I thought 'PyYAML tags' is the best and simple way. But later decided to go with 'PyKwalify' which actually defines a schema for YAML.

PyYAML tags:

The YAML file has a tag support where we can enforce this basic checks by prefixing the data type. (e.g) For integer - !!int "123"

More on PyYAML: http://pyyaml.org/wiki/PyYAMLDocumentation#Tags
This is good, but if you are going to expose this to the end user, then it might cause confusion.
I did some research to define a schema of YAML.

  • Validate the YAML with its corresponding schema for basic data type check.
  • Custom validations like IP address, random strings can be added in schema.
  • Have YAML schema separately leaving YAML data simple and readable.

PyKwalify:

There is a package called PyKwalify which serves this purpose: https://pypi.python.org/pypi/pykwalify

This package best fits my requirements.
I tried this with a small example in my local set up, and is working. Heres the sample schema file.

#sample schema

type: map
mapping:
    Emp:
        type:    map
        mapping:
            name:
                type:      str
                required:  yes
            email:
                type:      str
            age:
                type:      int
            birth:
                type:     str

Valid YAML file for this schema

---
Emp:
    name:   "abc"
    email:  "[email protected]"
    age:    yy
    birth:  "xx/xx/xxxx"
                                                                

Thanks

深陷 2024-09-17 07:12:15

这些看起来不错。 yaml 解析器可以处理语法错误,并且这些库之一可以验证数据结构。

These look good. The yaml parser can handle the syntax erorrs, and one of these libraries can validate the data structures.

别闹i 2024-09-17 07:12:15

您可以使用 python 的 yaml lib 显示加载文件的消息/字符/行/文件。

#!/usr/bin/env python

import yaml

with open("example.yaml", 'r') as stream:
    try:
        print(yaml.load(stream))
    except yaml.YAMLError as exc:
        print(exc)

错误消息可以通过 exc.problem

访问 exc.problem_mark 来获取 对象。

该对象允许您访问属性

  • 名称

因此您可以创建自己的问题指针:

pm = exc.problem_mark
print("Your file {} has an issue on line {} at position {}".format(pm.name, pm.line, pm.column))

You can use python's yaml lib to display message/char/line/file of your loaded file.

#!/usr/bin/env python

import yaml

with open("example.yaml", 'r') as stream:
    try:
        print(yaml.load(stream))
    except yaml.YAMLError as exc:
        print(exc)

The error message can be accessed via exc.problem

Access exc.problem_mark to get a <yaml.error.Mark> object.

This object allows you to access attributes

  • name
  • column
  • line

Hence you can create your own pointer to the issue:

pm = exc.problem_mark
print("Your file {} has an issue on line {} at position {}".format(pm.name, pm.line, pm.column))
空城仅有旧梦在 2024-09-17 07:12:15

我封装了一些现有的 json 相关 python 库旨在能够将它们与 yaml 一起使用

生成的 python 库主要包装...

  • jsonschema - 针对 json-schema 文件的 json 文件验证器,被包装以支持还针对 yaml 格式的 json-schema 文件验证 yaml 文件。

  • jsonpath-ng - Python 的 JSONPath 实现,被包装以支持直接在 yamlJSONPath > files.

...并且可以在 github 上找到:

https://github.com/yaccob/ytools

它可以使用 pip 安装:

pip install ytools

验证示例(来自 https://github.com/yaccob/ytools#validation):

import ytools
ytools.validate("test/sampleschema.yaml", ["test/sampledata.yaml"])

您还没有开箱即用的是针对 yaml 格式的外部架构进行验证以及。

ytools 并没有提供任何以前不存在的东西——它只是让一些现有解决方案的应用更加灵活和方便。

I wrapped some existing json-related python libraries aiming for being able to use them with yaml as well.

The resulting python library mainly wraps ...

  • jsonschema - a validator for json files against json-schema files, being wrapped to support validating yaml files against json-schema files in yaml-format as well.

  • jsonpath-ng - an implementation of JSONPath for python, being wrapped to support JSONPath selection directly on yaml files.

... and is available on github:

https://github.com/yaccob/ytools

It can be installed using pip:

pip install ytools

Validation example (from https://github.com/yaccob/ytools#validation):

import ytools
ytools.validate("test/sampleschema.yaml", ["test/sampledata.yaml"])

What you don't get out of the box yet, is validating against external schemas that are in yaml format as well.

ytools is not providing anything that hasn't existed before - it just makes the application of some existing solutions more flexible and more convenient.

も星光 2024-09-17 07:12:15

我不知道 python 解决方案。但是有一个用于 YAML 的 ruby​​ 模式验证器,名为 kwalify< /a>.如果您没有遇到 python 库,您应该能够使用子进程访问它。

I'm not aware of a python solution. But there is a ruby schema validator for YAML called kwalify. You should be able to access it using subprocess if you don't come across a python library.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文