shell脚本中的关联数组
我们需要一个模拟关联数组或类似映射的数据结构的脚本来进行 shell 脚本编写。 有人可以让我们知道它是如何完成的吗?
We require a script that simulates associative arrays or map-like data structure for shell scripting. Can anyone let's know how it is done?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(17)
几年前,我为 bash 编写了脚本库,它支持关联数组以及其他功能(日志记录、配置文件、对命令行参数的扩展支持、生成帮助、单元测试等)。 该库包含关联数组的包装器,并自动切换到适当的模型(bash4 的内部模型和以前版本的模拟模型)。 它被称为 shell-framework,托管在 origo.ethz.ch,但今天该资源已关闭。 如果有人还需要的话我可以分享给你。
several years ago I wrote script library for bash which supported associative arrays among other features (logging, configuration files, extended support for command line argument, generate help, unit testing, etc). The library contains a wrapper for associative arrays and automatically switches to appropriate model (internal for bash4 and emulate for previous versions). It was called shell-framework and hosted at origo.ethz.ch but today the resource is closed. If someone still needs it I can share it with you.
回复较晚,但请考虑以这种方式解决问题,使用 bash 内置 read,如后面的 ufw 防火墙脚本的代码片段所示。 此方法的优点是可以根据需要使用尽可能多的分隔字段集(不仅仅是 2 个)。 我们使用了 | 分隔符,因为端口范围说明符可能需要冒号,即 6001:6010。
Late reply, but consider addressing the problem in this way, using the bash builtin read as illustrated within the code snippet from a ufw firewall script that follows. This approach has the advantage of using as many delimited field sets (not just 2) as are desired. We have used the | delimiter because port range specifiers may require a colon, ie 6001:6010.
如果可移植性不是您主要关心的问题,另一种选择是使用 shell 中内置的关联数组。 这应该可以在 bash 4.0(现在在大多数主要发行版上可用,但不能在 OS X 上使用,除非您自己安装)、ksh 和 zsh 中工作:
根据 shell,您可能需要执行 typeset -A newmap< /code> 而不是
declare -A newmap
,或者在某些情况下可能根本没有必要。Another option, if portability is not your main concern, is to use associative arrays that are built in to the shell. This should work in bash 4.0 (available now on most major distros, though not on OS X unless you install it yourself), ksh, and zsh:
Depending on the shell, you may need to do a
typeset -A newmap
instead ofdeclare -A newmap
, or in some it may not be necessary at all.另一种非 bash 4 方式。
您也可以在其中添加一个 if 语句进行搜索。 如果[[ $var =~ /blah/ ]]。
管他呢。
Another non-bash 4 way.
You could throw an if statement for searching in there as well. if [[ $var =~ /blah/ ]].
or whatever.
我认为您需要退一步思考一下映射或关联数组到底是什么。 它只是一种存储给定键的值并快速有效地取回该值的方法。 您可能还希望能够迭代键以检索每个键值对,或删除键及其关联值。
现在,考虑一下您在 shell 脚本编写中一直使用的数据结构,甚至只是在 shell 中而不编写脚本,它也具有这些属性。 难住了吗? 这是文件系统。
实际上,在 shell 编程中,您所需要的关联数组只是一个临时目录。
mktemp -d
是关联数组构造函数:如果您不喜欢使用
echo
和cat
,您可以随时编写一些小包装器; 这些是根据 Irfan 的模型建模的,尽管它们只是输出值而不是设置任意变量,例如$value
:edit:这种方法实际上比线性方法快得多使用提问者建议的 sed 进行搜索,并且更健壮(它允许键和值包含 -、=、空格、qnd“:SP:”)。 它使用文件系统这一事实并不会使它变慢;相反,它会变得更慢。 实际上,除非您调用
sync
,否则永远不能保证这些文件会被写入磁盘; 对于像这样寿命较短的临时文件,其中许多文件很可能永远不会写入磁盘。我使用以下驱动程序对 Irfan 的代码、Jerry 对 Irfan 的代码的修改以及我的代码进行了一些基准测试:
结果:
I think that you need to step back and think about what a map, or associative array, really is. All it is is a way to store a value for a given key, and get that value back quickly and efficiently. You may also want to be able to iterate over the keys to retrieve every key value pair, or delete keys and their associated values.
Now, think about a data structure you use all the time in shell scripting, and even just in the shell without writing a script, that has these properties. Stumped? It's the filesystem.
Really, all you need to have an associative array in shell programming is a temp directory.
mktemp -d
is your associative array constructor:If you don't feel like using
echo
andcat
, you can always write some little wrappers; these ones are modelled off of Irfan's, though they just output the value rather than setting arbitrary variables like$value
:edit: This approach is actually quite a bit faster than the linear search using sed suggested by the questioner, as well as more robust (it allows keys and values to contain -, =, space, qnd ":SP:"). The fact that it uses the filesystem does not make it slow; these files are actually never guaranteed to be written to the disk unless you call
sync
; for temporary files like this with a short lifetime, it's not unlikely that many of them will never be written to disk.I did a few benchmarks of Irfan's code, Jerry's modification of Irfan's code, and my code, using the following driver program:
The results:
要添加到 Irfan 的答案,这里是
get()
的更短、更快的版本,因为它不需要迭代地图内容:To add to Irfan's answer, here is a shorter and faster version of
get()
since it requires no iteration over the map contents:另一种非 bash-4(即 bash 3,Mac 兼容)方式:
打印:
带有
case
的函数就像一个关联数组。 不幸的是,它不能使用 return ,因此它必须回显其输出,但这不是问题,除非您是一个避免分叉子 shell 的纯粹主义者。Yet another non-bash-4 (i.e., bash 3, Mac-compatible) way:
Prints:
The function with the
case
acts like an associative array. Unfortunately it cannot usereturn
, so it has toecho
its output, but this is not a problem, unless you are a purist that shuns forking subshells.例子:
Example:
Bash4 本身就支持这一点。 不要使用
grep
或eval
,它们是最丑陋的 hack。有关示例代码的详细答案,请参阅:
https://stackoverflow.com/questions/3467959
Bash4 supports this natively. Do not use
grep
oreval
, they are the ugliest of hacks.For a verbose, detailed answer with example code see:
https://stackoverflow.com/questions/3467959
对于 Bash 3,有一种特殊情况有一个很好且简单的解决方案:
如果您不想处理大量变量,或者键只是无效的变量标识符,那么您的数组是可以保证的要少于 256 个项目,您可以滥用函数返回值。 该解决方案不需要任何子 shell,因为该值可以随时作为变量使用,也不需要任何迭代,因此性能极高。 而且它的可读性非常好,几乎就像 Bash 4 版本一样。
这是最基本的版本:
请记住,在
case
中使用单引号,否则可能会发生通配。 从一开始就对静态/冻结哈希非常有用,但是可以从 hash_keys=() 数组编写索引生成器。请注意,它默认为第一个元素,因此您可能需要保留第零个元素:
警告:长度现在不正确。
或者,如果您想保持从零开始的索引,您可以保留另一个索引值并防止不存在的键,但它的可读性较差:
或者,为了保持长度正确,请将索引偏移一:
For Bash 3, there is a particular case that has a nice and simple solution:
If you don't want to handle a lot of variables, or keys are simply invalid variable identifiers, and your array is guaranteed to have less than 256 items, you can abuse function return values. This solution does not require any subshell as the value is readily available as a variable, nor any iteration so that performance screams. Also it's very readable, almost like the Bash 4 version.
Here's the most basic version:
Remember, use single quotes in
case
, else it's subject to globbing. Really useful for static/frozen hashes from the start, but one could write an index generator from ahash_keys=()
array.Watch out, it defaults to the first one, so you may want to set aside zeroth element:
Caveat: the length is now incorrect.
Alternatively, if you want to keep zero-based indexing, you can reserve another index value and guard against a non-existent key, but it's less readable:
Or, to keep the length correct, offset index by one:
现在回答这个问题。
以下脚本模拟 shell 脚本中的关联数组。 它简单且非常容易理解。
Map 只是一个永无止境的字符串,其中 keyValuePair 保存为
--name=Irfan --designation=SSE --company=My:SP:Own:SP:Company
空格被替换为 ':SP:' 值
编辑:刚刚添加了另一种方法来获取所有键。
Now answering this question.
Following scripts simulates associative arrays in shell scripts. Its simple and very easy to understand.
Map is nothing but a never ending string that has keyValuePair saved as
--name=Irfan --designation=SSE --company=My:SP:Own:SP:Company
spaces are replaced with ':SP:' for values
edit: Just added another method to fetch all keys.
您可以使用动态变量名称,并让变量名称像哈希映射的键一样工作。
例如,如果您有一个包含两列的输入文件,名称、信用,如下例所示,并且您想要对每个用户的收入求和:
下面的命令将使用动态变量作为键,以以下形式对所有内容求和: map_${person}:
读取结果:
输出将是:
为了详细说明这些技术,我正在 GitHub 上开发一个与 HashMap 对象 一样工作的函数,shell_map。
为了创建“HashMap 实例”,shell_map 函数能够以不同的名称创建自身的副本。 每个新函数副本都将具有不同的 $FUNCNAME 变量。 然后使用 $FUNCNAME 为每个 Map 实例创建一个命名空间。
映射键是全局变量,格式为 $FUNCNAME_DATA_$KEY,其中 $KEY 是添加到映射的键。 这些变量是动态的变量。
下面我将提供它的简化版本,以便您可以用作示例。
用法:
You can use dynamic variable names and let the variables names work like the keys of a hashmap.
For example, if you have an input file with two columns, name, credit, as the example bellow, and you want to sum the income of each user:
The command bellow will sum everything, using dynamic variables as keys, in the form of map_${person}:
To read the results:
The output will be:
Elaborating on these techniques, I'm developing on GitHub a function that works just like a HashMap Object, shell_map.
In order to create "HashMap instances" the shell_map function is able create copies of itself under different names. Each new function copy will have a different $FUNCNAME variable. $FUNCNAME then is used to create a namespace for each Map instance.
The map keys are global variables, in the form $FUNCNAME_DATA_$KEY, where $KEY is the key added to the Map. These variables are dynamic variables.
Bellow I'll put a simplified version of it so you can use as example.
Usage:
正如已经提到的,我发现最好的执行方法是将键/值写入文件,然后使用 grep/awk 检索它们。 这听起来像是各种不必要的 IO,但磁盘缓存启动并使其极其高效——比尝试使用上述方法之一将它们存储在内存中要快得多(如基准测试所示)。
这是我喜欢的一种快速、干净的方法:
如果您想强制每个键使用单值,您还可以在 hput() 中执行一些 grep/sed 操作。
I've found it true, as already mentioned, that the best performing method is to write out key/vals to a file, and then use grep/awk to retrieve them. It sounds like all sorts of unnecessary IO, but disk cache kicks in and makes it extremely efficient -- much faster than trying to store them in memory using one of the above methods (as the benchmarks show).
Here's a quick, clean method I like:
If you wanted to enforce single-value per key, you could also do a little grep/sed action in hput().
如果 jq 可用,则添加另一个选项:
Adding another option, if jq is available:
遗憾的是我之前没有看到这个问题 - 我已经编写了库 shell-framework ,其中其中包含地图(关联数组)。 它的最新版本可以在此处找到。
例子:
What a pity I did not see the question before - I've wrote library shell-framework which contains among others the maps(Associative arrays). The last version of it can be found here.
Example:
Shell没有内置的映射之类的数据结构,我使用原始字符串来描述项目,如下所示:
当提取项目及其属性时:
这似乎比其他人的答案不聪明,但对于新的shell新手来说很容易理解。
Shell have no built-in map like data structure, I use raw string to describe items like that:
when extract items and its attributes:
This seems like not clever than other people's answer, but easy to understand for new people to shell.
我修改了 Vadim 的解决方案,如下所示:
更改是对 map_get 进行更改,以防止它在请求不存在的密钥时返回错误,尽管副作用是它也会默默地忽略丢失的地图,但它适合我的用例更好,因为我只想检查密钥以便跳过循环中的项目。
I modified Vadim's solution with the following:
The change is to map_get in order to prevent it from returning errors if you request a key that doesn't exist, though the side-effect is that it will also silently ignore missing maps, but it suited my use-case better since I just wanted to check for a key in order to skip items in a loop.