如何从 Linux shell 脚本解析 YAML 文件?
我希望提供一个结构化的配置文件,对于非技术用户来说尽可能容易编辑(不幸的是它必须是一个文件),所以我想使用 YAML。然而我找不到任何从 Unix shell 脚本解析这个的方法。
I wish to provide a structured configuration file which is as easy as possible for a non-technical user to edit (unfortunately it has to be a file) and so I wanted to use YAML. I can't find any way of parsing this from a Unix shell script however.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(25)
我曾经使用 python 将 yaml 转换为 json 并在 jq 中进行处理。
I used to convert yaml to json using python and do my processing in jq.
另一种选择是将 YAML 转换为 JSON,然后使用 jq 与 JSON 表示进行交互,以从中提取信息或对其进行编辑。
我编写了一个包含此胶水的简单 bash 脚本 - 请参阅 GitHub 上的 Y2J 项目
Another option is to convert the YAML to JSON, then use jq to interact with the JSON representation either to extract information from it or edit it.
I wrote a simple bash script that contains this glue - see Y2J project on GitHub
如果您需要单个值,您可以使用一个工具将 YAML 文档转换为 JSON 并提供给
jq
,例如
yq
。Sample.yaml 的内容:
示例:
If you need a single value you could a tool which converts your YAML document to JSON and feed to
jq
, for exampleyq
.Content of sample.yaml:
Example:
我知道这是非常具体的,但我认为我的答案可能对某些用户有帮助。
如果您的计算机上安装了
node
和npm
,则可以使用js-yaml
。首先安装:
然后在你的bash脚本中
另外,如果你使用
jq
,你可以做类似的事情,因为
js-yaml
将yaml文件转换为json字符串文字。然后,您可以将该字符串与 Unix 系统中的任何 json 解析器一起使用。I know this is very specific, but I think my answer could be helpful for certain users.
If you have
node
andnpm
installed on your machine, you can usejs-yaml
.First install :
then in your bash script
Also if you are using
jq
you can do something like thatBecause
js-yaml
converts a yaml file to a json string literal. You can then use the string with any json parser in your unix system.现在做这件事的快速方法(以前的方法对我不起作用):
示例asd.yaml:
解析根:
解析key3:
A quick way to do the thing now (previous ones haven't worked for me):
Example asd.yaml:
parsing root:
parsing key3:
如果您有 python 2 和 PyYAML,则可以使用我编写的这个解析器,名为 parse_yaml.py。它所做的一些更简洁的事情是让您选择一个前缀(如果您有多个具有相似变量的文件)并从 yaml 文件中选择一个值。
例如,如果您有以下 yaml 文件:
staging.yaml:
prod.yaml:
您可以加载这两个文件而不会发生冲突。
甚至可以挑选您想要的值。
If you have python 2 and PyYAML, you can use this parser I wrote called parse_yaml.py. Some of the neater things it does is let you choose a prefix (in case you have more than one file with similar variables) and to pick a single value from a yaml file.
For example if you have these yaml files:
staging.yaml:
prod.yaml:
You can load both without conflict.
And even cherry pick the values you want.
您可以使用 等效 的 yq 用 golang 编写:
返回:
You could use an equivalent of yq that is written in golang:
returns:
每当您需要“如何使用 shell 脚本中的 YAML/JSON/兼容数据”的解决方案(该解决方案适用于几乎所有使用 Python 的操作系统(*nix、OSX、Windows))时,请考虑 yamlpath,它提供了几个用于读取、写入、搜索和合并 YAML、EYAML、JSON 和兼容文件的命令行工具。由于几乎每个操作系统都预装了 Python 或者安装起来很简单,这使得 yamlpath 具有高度的可移植性。更有趣的是:该项目定义了一种直观的路径语言,具有非常强大、命令行友好的语法,可以访问一个或多个节点。
针对您的具体问题,并使用 Python 的本机包管理器 或操作系统的包管理器 (yamlpath可以通过某些操作系统的 RPM 获得):
尽管您没有指定数据是一个简单的标量值,所以让我们加大赌注。如果你想要的结果是一个数组怎么办?更具挑战性的是,如果它是一个哈希数组并且您只想要每个结果的一个属性怎么办?进一步假设您的数据实际上分布在多个 YAML 文件中,并且您需要单个查询中的所有结果。这是一个更有趣的问题来证明。因此,假设您有以下两个 YAML 文件:
文件:data1.yaml
文件:data2.yaml
您如何仅报告每个的
sku
应用从 data2.yaml 到 data1.yaml 的更改后库存中的项目,全部来自 shell 脚本?尝试一下:只需几行代码即可获得所需的内容:
如您所见,yamlpath 将非常复杂的问题转化为简单的解决方案。请注意,整个查询是作为流处理的;查询没有更改任何 YAML 文件,也没有临时文件。
我意识到这是“解决同一问题的另一个工具”,但在阅读此处的其他答案后,yamlpath 似乎比大多数替代方案更便携和更强大。它还完全理解 YAML/JSON/兼容文件,并且不需要将 YAML 转换为 JSON 来执行请求的操作。因此,每当您需要更改源 YAML 文件中的数据时,都会保留原始 YAML 文件中的注释。与某些替代方案一样,yamlpath 也可以跨操作系统移植。更重要的是,yamlpath 定义了一种非常强大的查询语言,可以实现非常专业/过滤的数据查询。它甚至可以在单个查询中对文件不同部分的结果进行操作。
如果您想一次获取或设置数据中的多个值——包括散列/数组/映射/列表等复杂数据——yamlpath 可以做到这一点。想要一个值但不知道它在文档中的确切位置? yamlpath 可以找到它并为您提供确切的路径。需要将多个数据文件合并在一起,包括来自 STDIN 的数据文件? yamlpath 也这样做。此外,yamlpath 完全理解 YAML 锚点及其别名,始终提供或更改您期望的数据,无论它是具体值还是引用值。
免责声明:我编写并维护了 yamlpath,它基于 ruamel.yaml,而 ruamel.yaml 又基于 PyYAML。因此,yamlpath 完全符合标准。
Whenever you need a solution for "How to work with YAML/JSON/compatible data from a shell script" which works on just about every OS with Python (*nix, OSX, Windows), consider yamlpath, which provides several command-line tools for reading, writing, searching, and merging YAML, EYAML, JSON, and compatible files. Since just about every OS either comes with Python pre-installed or it is trivial to install, this makes yamlpath highly portable. Even more interesting: this project defines an intuitive path language with very powerful, command-line-friendly syntax that enables accessing one or more nodes.
To your specific question and after installing yamlpath using Python's native package manager or your OS's package manager (yamlpath is available via RPM to some OSes):
You didn't specify that the data was a simple Scalar value though, so let's up the ante. What if the result you want is an Array? Even more challenging, what if it's an Array-of-Hashes and you only want one property of each result? Suppose further that your data is actually spread out across multiple YAML files and you need all the results in a single query. That's a much more interesting question to demonstrate with. So, suppose you have these two YAML files:
File: data1.yaml
File: data2.yaml
How would you report only the
sku
of every item in inventory after applying the changes from data2.yaml to data1.yaml, all from a shell script? Try this:You get exactly what you need from only a few lines of code:
As you can see, yamlpath turns very complex problems into trivial solutions. Note that the entire query was handled as a stream; no YAML files were changed by the query and there were no temp files.
I realize this is "yet another tool to solve the same question" but after reading the other answers here, yamlpath appears more portable and robust than most alternatives. It also fully understands YAML/JSON/compatible files and it does not need to convert YAML to JSON to perform requested operations. As such, comments within the original YAML file are preserved whenever you need to change data in the source YAML file. Like some alternatives, yamlpath is also portable across OSes. More importantly, yamlpath defines a query language that is extremely powerful, enabling very specialized/filtered data queries. It can even operate against results from disparate parts of the file in a single query.
If you want to get or set many values in the data at once -- including complex data like hashes/arrays/maps/lists -- yamlpath can do that. Want a value but don't know precisely where it is in the document? yamlpath can find it and give you the exact path(s). Need to merge multiple data file together, including from STDIN? yamlpath does that, too. Further, yamlpath fully comprehends YAML anchors and their aliases, always giving or changing exactly the data you expect whether it is a concrete or referenced value.
Disclaimer: I wrote and maintain yamlpath, which is based on ruamel.yaml, which is in turn based on PyYAML. As such, yamlpath is fully standards-compliant.
使用 Python 的 PyYAML 或 YAML::Perl。
如果您想将所有 YAML 值解析为 bash 值,请尝试此脚本。这也将处理评论。请参阅下面的示例用法:
test.yml
需要 YAML 值的脚本:
使用 bash 访问变量:
Complex parsing is easiest with a library such as Python's PyYAML or YAML::Perl.
If you want to parse all the YAML values into bash values, try this script. This will handle comments as well. See example usage below:
test.yml
Script where YAML values are needed:
Access variables with bash:
如果您知道您感兴趣的标签以及您期望的 yaml 结构,那么在 Bash 中编写一个简单的 YAML 解析器并不难。
在以下示例中,解析器将结构化 YAML 文件读取到环境变量、数组和关联数组中。
注意:此解析器的复杂性与 YAML 文件的结构有关。对于 YAML 文件的每个结构化组件,您将需要一个单独的子例程。高度结构化的 YAML 文件可能需要更复杂的方法,例如通用递归下降解析器。
xmas.yaml 文件:
解析器使用
mapfile
将文件作为数组读入内存,然后循环遍历每个标签并创建环境变量。pear-tree:
、turtle-doves:
和french-hens:
最终成为简单的环境变量calling-birds: 变成数组
xmas-fifth-day:
结构表示为关联数组,但是如果您不使用 Bash 4.0 或更高版本,则可以将它们编码为环境变量。这会产生以下输出
If you know what tags you are interested in and the yaml structure you expect then it is not that hard to write a simple YAML parser in Bash.
In the following example the parser reads a structured YAML file into environment variables, an array and an associative array.
Note: The complexity of this parser is tied to the structure of the YAML file. You will need a separate subroutine for each structured component of the YAML file. Highly structured YAML files might require a more sophisticated approach, eg a generic recursive descent parser.
The xmas.yaml file:
The parser uses
mapfile
to read the file into memory as an array then cycles through each tag and creates environment variables.pear-tree:
,turtle-doves:
andfrench-hens:
end up as simple environment variablescalling-birds:
becomes an arrayxmas-fifth-day:
structure is represented as an associative array however you could encode these as environment variables if you are not using Bash 4.0 or later.This produces the following output
在 RHEL 中,此命令会输出 STDIN 是否是 YAML,如果不是,则输出。
In RHEL, this command outputs if STDIN is a YAML or die if not.
您还可以考虑使用 Grunt (JavaScript 任务运行程序)。可以方便地与shell集成。它支持读取 YAML (
grunt.file.readYAML
) 和 JSON (grunt.file.readJSON
) 文件。这可以通过在
Gruntfile.js
(或Gruntfile.coffee
)中创建任务来实现,例如:然后从 shell 只需运行
grunt foo
(检查grunt --help
以获取可用任务)。此外,您还可以使用从任务传递的输入变量来实现
exec:foo
任务 (grunt-exec
) (foo: { cmd: 'echo bar <% = foo %>' }
) 以便以您想要的任何格式打印输出,然后将其通过管道传输到另一个命令中。还有一个与 Grunt 类似的工具,名为 gulp ,带有附加插件 gulp-yaml。
通过安装:
npm install --save-dev gulp-yaml
示例用法:
更多选项来处理 YAML 格式,检查YAML 站点以获取可用的项目、库和其他资源帮助您解析该格式。
其他工具:
Jshon
<块引用>
解析、读取和创建 JSON
You can also consider using Grunt (The JavaScript Task Runner). Can be easily integrated with shell. It supports reading YAML (
grunt.file.readYAML
) and JSON (grunt.file.readJSON
) files.This can be achieved by creating a task in
Gruntfile.js
(orGruntfile.coffee
), e.g.:then from shell just simply run
grunt foo
(checkgrunt --help
for available tasks).Further more you can implement
exec:foo
tasks (grunt-exec
) with input variables passed from your task (foo: { cmd: 'echo bar <%= foo %>' }
) in order to print the output in whatever format you want, then pipe it into another command.There is also similar tool to Grunt, it's called gulp with additional plugin gulp-yaml.
Install via:
npm install --save-dev gulp-yaml
Sample usage:
To more options to deal with YAML format, check YAML site for available projects, libraries and other resources which can help you to parse that format.
Other tools:
Jshon
受到托斯顿回答的启发:
如果你使用这个(就像我一样),你可以轻轻地扇自己一巴掌;)。
Inspired by Torsten's answer:
You can gently slap yourself if you use this (as I did) ;).
我知道我的答案很具体,但是如果已经安装了 PHP 和 Symfony,那么使用 Symfony 的 YAML 解析器会非常方便。
例如:
这里我只是使用 var_dump 来输出解析后的数组,但当然你可以做更多...:)
I know my answer is specific, but if one already has PHP and Symfony installed, it can be very handy to use Symfony's YAML parser.
For instance:
Here I simply used
var_dump
to output the parsed array but of course you can do much more... :)这是一个仅支持 bash 的解析器,它利用 sed 和 awk 来解析简单的 yaml 文件:
它可以理解以下文件:
使用以下方式解析时,
将输出:
它也可以理解由 ruby 生成的 yaml 文件,其中可能包含 ruby 符号,例如:
并将输出与前面的示例相同的内容。
脚本中的典型用法是:
parse_yaml 接受前缀参数,以便导入的设置都具有公共前缀(这将降低命名空间冲突的风险)。
产量:
请注意,文件中的先前设置可以由以后的设置引用:
另一个很好的用法是首先解析默认文件,然后解析用户设置,因为后面的设置会覆盖第一个设置,所以这是有效的:
Here is a bash-only parser that leverages sed and awk to parse simple yaml files:
It understands files such as:
Which, when parsed using:
will output:
it also understands yaml files, generated by ruby which may include ruby symbols, like:
and will output the same as in the previous example.
typical use within a script is:
parse_yaml accepts a prefix argument so that imported settings all have a common prefix (which will reduce the risk of namespace collisions).
yields:
Note that previous settings in a file can be referred to by later settings:
Another nice usage is to first parse a defaults file and then the user settings, which works since the latter settings overrides the first ones:
我已经用 python 编写了
shyaml
来满足 shell 命令行中的 YAML 查询需求。概述:
示例的 YAML 文件(具有复杂功能):
基本查询:
对复杂值进行更复杂的循环查询:
几个关键点:
\0
填充输出可用于可靠的多行输入操作。subvalue.maintainer
是有效键)。subvalue.things.-1
是subvalue.things
序列的最后一个元素。)更多示例和文档可在 shyaml github 页面 或 shyaml PyPI 页面。
I've written
shyaml
in python for YAML query needs from the shell command line.Overview:
Example's YAML file (with complex features):
Basic query:
More complex looping query on complex values:
A few key points:
\0
padded output is available for solid multiline entry manipulation.subvalue.maintainer
is a valid key).subvalue.things.-1
is the last element of thesubvalue.things
sequence.)More sample and documentation are available on the shyaml github page or the shyaml PyPI page.
(https://github.com/mikefarah/yq#readme)
作为示例(直接盗用来自文档),给定一个sample.yaml文件:
然后
将输出
(https://github.com/mikefarah/yq#readme)
As an example (stolen straight from the documentation), given a sample.yaml file of:
then
will output
鉴于如今 Python3 和 PyYAML 是很容易满足的依赖项,以下内容可能会有所帮助:
Given that Python3 and PyYAML are quite easy dependencies to meet nowadays, the following may help:
我的用例可能与原始帖子所要求的完全相同,也可能不完全相同,但绝对相似。
我需要引入一些 YAML 作为 bash 变量。 YAML 的深度永远不会超过一层。
YAML 看起来像这样:
Output like-a dis:
我用这一行实现了输出:
s/:[^:\/\/]/="/g
找到:
并将其替换为="
,同时忽略://
(对于 URL)s/$/"/g
附加"
到每行末尾s/ *=/=/g
删除=
之前的所有空格My use case may or may not be quite the same as what this original post was asking, but it's definitely similar.
I need to pull in some YAML as bash variables. The YAML will never be more than one level deep.
YAML looks like so:
Output like-a dis:
I achieved the output with this line:
s/:[^:\/\/]/="/g
finds:
and replaces it with="
, while ignoring://
(for URLs)s/$/"/g
appends"
to the end of each lines/ *=/=/g
removes all spaces before=
这是 Stefan Farestam 答案的扩展版本:
此版本支持
-
表示法以及字典和列表的简短表示法。以下输入:产生此输出:
如您所见,
-
项会自动编号,以便为每个项获取不同的变量名称。在 bash 中没有多维数组,因此这是一种解决方法。支持多个级别。要解决 @briceburg 提到的尾随空格问题,应该将值括在单引号或双引号中。但是,仍然存在一些限制: 当值包含逗号时,字典和列表的扩展可能会产生错误的结果。此外,尚不支持更复杂的结构,例如跨多行的值(例如 ssh-keys)。
关于代码的几句话:第一个 sed 命令将字典的简写形式
{ key: value, ...}
扩展为常规形式,并将其转换为更简单的 yaml 样式。第二个sed
调用对列表的简短表示法执行相同的操作,并将[entry, ... ]
转换为带有-
的逐项列表符号。第三个sed
调用是处理普通字典的原始调用,现在添加了处理带有-
和缩进的列表。awk
部分为每个缩进级别引入一个索引,并在变量名称为空时(即处理列表时)增加索引。使用计数器的当前值而不是空 vname。当上升一级时,计数器归零。编辑:我为此创建了一个 github 存储库。
here an extended version of the Stefan Farestam's answer:
This version supports the
-
notation and the short notation for dictionaries and lists. The following input:produces this output:
as you can see the
-
items automatically get numbered in order to obtain different variable names for each item. Inbash
there are no multidimensional arrays, so this is one way to work around. Multiple levels are supported.To work around the problem with trailing white spaces mentioned by @briceburg one should enclose the values in single or double quotes. However, there are still some limitations: Expansion of the dictionaries and lists can produce wrong results when values contain commas. Also, more complex structures like values spanning multiple lines (like ssh-keys) are not (yet) supported.
A few words about the code: The first
sed
command expands the short form of dictionaries{ key: value, ...}
to regular and converts them to more simple yaml style. The secondsed
call does the same for the short notation of lists and converts[ entry, ... ]
to an itemized list with the-
notation. The thirdsed
call is the original one that handled normal dictionaries, now with the addition to handle lists with-
and indentations. Theawk
part introduces an index for each indentation level and increases it when the variable name is empty (i.e. when processing a list). The current value of the counters are used instead of the empty vname. When going up one level, the counters are zeroed.Edit: I have created a github repository for this.
可以将小脚本传递给某些解释器,例如 Python。使用 Ruby 及其 YAML 库执行此操作的简单方法如下:
,其中
data
是包含 yaml 值的哈希(或数组)。作为奖励,它可以很好地解析 Jekyll 的头条内容。
It's possible to pass a small script to some interpreters, like Python. An easy way to do so using Ruby and its YAML library is the following:
, where
data
is a hash (or array) with the values from yaml.As a bonus, it'll parse Jekyll's front matter just fine.
将我的答案从 How to conversion a json response into yaml in bash 移开,因为这似乎是关于处理的权威帖子从命令行解析 YAML 文本。
我想添加有关 yq YAML 实现的详细信息。由于此 YAML 解析器有两个实现,且名称均为 yq,因此在不查看实现的 DSL 的情况下很难区分正在使用哪一个。有两个可用的实现是
jq
,用 Python 编写,使用 PyYAML 库进行 YAML 解析两者都可以通过几乎所有主要发行版
这两个版本都有一些优点和缺点,但是需要强调的一些有效点(从他们的 repo 指令中采用)
kislyuk/yq
jq
中采用,对于熟悉后者的用户来说,解析和操作变得非常简单jq
不保留评论,在回合期间 -行程转换,评论丢失。xq
,它使用 xmltodict 将 XML 转码为 JSON,并将其通过管道传递给jq,您可以在其中应用相同的 DSL 对对象执行 CRUD 操作并将输出往返返回 XML。
-i
标志的就地编辑模式(类似于sed -i
)mikefarah/yq
-i
标志的就地编辑模式(类似于sed -i
)-C
标志对输出 YAML 进行着色(不适用于 JSON 输出)和子元素缩进(默认为 2 个空格)我的看法在以下两个版本的 YAML(也在其他答案中引用)上使用两个
实现执行的各种操作(一些常用操作)
root_key2
的值coffee
orange_juice
删除属性下的所有项目食物
使用kislyuk/yq
这非常简单。您所需要做的就是使用
-y
标志将jq
JSON 输出转码回 YAML。使用 mikefarah/yq
截至 2020 年 12 月 21 日,
yq
v4 处于测试阶段,支持更强大的路径表达式,并支持类似于使用jq
的 DSL。阅读过渡说明 - 从 V3 升级Moving my answer from How to convert a json response into yaml in bash, since this seems to be the authoritative post on dealing with YAML text parsing from command line.
I would like to add details about the
yq
YAML implementation. Since there are two implementations of this YAML parser lying around, both having the nameyq
, it is hard to differentiate which one is in use, without looking at the implementations' DSL. There two available implementations arejq
, written in Python using the PyYAML library for YAML parsingBoth are available for installation via standard installation package managers on almost all major distributions
Both the versions have some pros and cons over the other, but a few valid points to highlight (adopted from their repo instructions)
kislyuk/yq
jq
, for users familiar with the latter, the parsing and manipulation becomes quite straightforwardjq
doesn't preserve comments, during the round-trip conversion, the comments are lost.xq
, which transcodes XML to JSON using xmltodict and pipes it tojq
, on which you can apply the same DSL to perform CRUD operations on the objects and round-trip the output back to XML.-i
flag (similar tosed -i
)mikefarah/yq
-i
flag (similar tosed -i
)-C
flag (not applicable for JSON output) and indentation of the sub elements (default at 2 spaces)My take on the following YAML (referenced in other answer as well) with both the versions
Various actions to be performed with both the implementations (some frequently used operations)
root_key2
coffee
orange_juice
food
Using kislyuk/yq
Which is pretty straightforward. All you need is to transcode
jq
JSON output back into YAML with the-y
flag.Using mikefarah/yq
As of today Dec 21st 2020,
yq
v4 is in beta and supports much powerful path expressions and supports DSL similar to usingjq
. Read the transition notes - Upgrading from V3我刚刚编写了一个名为 Yay! 的解析器(Yaml 不是 Yamlesque!),它解析 Yamlesque,YAML 的一个小子集。因此,如果您正在为 Bash 寻找 100% 兼容的 YAML 解析器,那么这不是您的选择。但是,引用 OP,如果您想要一个类似于 YAML 的结构化配置文件,让非技术用户尽可能轻松地编辑,这可能会很有趣。
它受到早期答案的启发,但写入关联数组(是的,它需要 Bash 4.x)的基本变量。它的实现方式允许在不事先知道键的情况下解析数据,以便可以编写数据驱动的代码。
除了键/值数组元素之外,每个数组还有一个包含键名称列表的
keys
数组、一个包含子数组名称的children
数组和一个引用其父级的parent
键。这是 Yamlesque 的示例:
这里是一个示例,展示如何使用它:
哪个输出:
以及 这里是解析器:
链接的源文件中有一些文档下面是该代码功能的简短说明。
yay_parse
函数首先定位input
文件或以退出状态 1 退出。接下来,它确定数据集前缀
,可以显式指定或从文件名派生。它将有效的 bash 命令写入其标准输出,如果执行该命令,则会定义表示输入数据文件内容的数组。其中第一个定义了顶级数组:
请注意,数组声明是关联的 (
-A
),这是 Bash 版本 4 的一个功能。声明也是全局的 (-g
) >),因此它们可以在函数中执行,但可在全局范围内使用,如yay
帮助程序:输入数据最初使用
sed
进行处理。在使用 ASCII 文件分隔符 字符并删除值字段周围的所有双引号。两种表达方式相似;它们的不同之处仅在于第一个选择带引号的值,而第二个选择不带引号的值。
文件分隔符 (28使用 /hex 12/octal 034) 是因为,作为不可打印字符,它不太可能出现在输入数据中。
结果通过管道输送到 awk 中,awk 一次处理一行输入。它使用 FS 字符将每个字段分配给一个变量:
所有行都有一个缩进(可能为零)和一个键,但它们并不都有值。它计算将第一个字段(包含前导空格)的长度除以二的行的缩进级别。没有任何缩进的顶级项目处于缩进级别零。
接下来,它计算出当前项目使用的
前缀
。这是添加到键名称以形成数组名称的内容。顶级数组有一个 root_prefix ,它定义为数据集名称和下划线:parent_key 是当前行缩进级别之上的缩进级别的键并表示当前行所属的集合。集合的键/值对将存储在一个数组中,其名称定义为
prefix
和parent_key
的串联。对于顶层(缩进级别零),数据集前缀用作父键,因此它没有前缀(设置为
""
)。所有其他数组都以根前缀为前缀。接下来,当前键被插入到包含键的(awk 内部)数组中。该数组在整个 awk 会话中持续存在,因此包含先前行插入的键。使用其缩进作为数组索引将键插入数组中。
由于此数组包含来自前一行的键,因此缩进级别大于当前行缩进级别的任何键都将被删除:
这使得键数组包含从缩进级别 0 的根到当前行的键链。它会删除前一行缩进比当前行更深时留下的过时键。
最后一部分输出 bash 命令:不带值的输入行启动新的缩进级别(YAML 术语中的集合),带值的输入行添加一个键到当前集合。
集合的名称是当前行的
prefix
和parent_key
的串联。当键具有值时,具有该值的键将被分配给当前集合,如下所示:
第一个语句输出将值分配给以该键命名的关联数组元素的命令,第二个语句输出添加该值的命令集合的空格分隔
keys
列表的键:当键没有值时,将像这样启动一个新集合:
第一个语句输出将新集合添加到当前集合的命令以空格分隔的
children
列表,第二个输出命令为新集合声明一个新的关联数组:yay_parse
的所有输出都可以通过以下方式解析为 bash 命令: basheval
或source
内置命令。I just wrote a parser that I called Yay! (Yaml ain't Yamlesque!) which parses Yamlesque, a small subset of YAML. So, if you're looking for a 100% compliant YAML parser for Bash then this isn't it. However, to quote the OP, if you want a structured configuration file which is as easy as possible for a non-technical user to edit that is YAML-like, this may be of interest.
It's inspred by the earlier answer but writes associative arrays (yes, it requires Bash 4.x) instead of basic variables. It does so in a way that allows the data to be parsed without prior knowledge of the keys so that data-driven code can be written.
As well as the key/value array elements, each array has a
keys
array containing a list of key names, achildren
array containing names of child arrays and aparent
key that refers to its parent.This is an example of Yamlesque:
Here is an example showing how to use it:
which outputs:
And here is the parser:
There is some documentation in the linked source file and below is a short explanation of what the code does.
The
yay_parse
function first locates theinput
file or exits with an exit status of 1. Next, it determines the datasetprefix
, either explicitly specified or derived from the file name.It writes valid
bash
commands to its standard output that, if executed, define arrays representing the contents of the input data file. The first of these defines the top-level array:Note that array declarations are associative (
-A
) which is a feature of Bash version 4. Declarations are also global (-g
) so they can be executed in a function but be available to the global scope like theyay
helper:The input data is initially processed with
sed
. It drops lines that don't match the Yamlesque format specification before delimiting the valid Yamlesque fields with an ASCII File Separator character and removing any double-quotes surrounding the value field.The two expressions are similar; they differ only because the first one picks out quoted values where as the second one picks out unquoted ones.
The File Separator (28/hex 12/octal 034) is used because, as a non-printable character, it is unlikely to be in the input data.
The result is piped into
awk
which processes its input one line at a time. It uses the FS character to assign each field to a variable:All lines have an indent (possibly zero) and a key but they don't all have a value. It computes an indent level for the line dividing the length of the first field, which contains the leading whitespace, by two. The top level items without any indent are at indent level zero.
Next, it works out what
prefix
to use for the current item. This is what gets added to a key name to make an array name. There's aroot_prefix
for the top-level array which is defined as the data set name and an underscore:The
parent_key
is the key at the indent level above the current line's indent level and represents the collection that the current line is part of. The collection's key/value pairs will be stored in an array with its name defined as the concatenation of theprefix
andparent_key
.For the top level (indent level zero) the data set prefix is used as the parent key so it has no prefix (it's set to
""
). All other arrays are prefixed with the root prefix.Next, the current key is inserted into an (awk-internal) array containing the keys. This array persists throughout the whole awk session and therefore contains keys inserted by prior lines. The key is inserted into the array using its indent as the array index.
Because this array contains keys from previous lines, any keys with an indent level grater than the current line's indent level are removed:
This leaves the keys array containing the key-chain from the root at indent level 0 to the current line. It removes stale keys that remain when the prior line was indented deeper than the current line.
The final section outputs the
bash
commands: an input line without a value starts a new indent level (a collection in YAML parlance) and an input line with a value adds a key to the current collection.The collection's name is the concatenation of the current line's
prefix
andparent_key
.When a key has a value, a key with that value is assigned to the current collection like this:
The first statement outputs the command to assign the value to an associative array element named after the key and the second one outputs the command to add the key to the collection's space-delimited
keys
list:When a key doesn't have a value, a new collection is started like this:
The first statement outputs the command to add the new collection to the current's collection's space-delimited
children
list and the second one outputs the command to declare a new associative array for the new collection:All of the output from
yay_parse
can be parsed as bash commands by the basheval
orsource
built-in commands.很难说,因为这取决于您希望解析器从 YAML 文档中提取什么。对于简单的情况,您可能可以使用
grep
、cut
、awk
等。对于更复杂的解析,您需要使用 full-崩溃的解析库,例如 Python 的 PyYAML 或 YAML::Perl。Hard to say because it depends on what you want the parser to extract from your YAML document. For simple cases, you might be able to use
grep
,cut
,awk
etc. For more complex parsing you would need to use a full-blown parsing library such as Python's PyYAML or YAML::Perl.