在 awk 中预填充关联数组键?
我编写了一个 munin 插件,它使用 slurm 的 sacct 来监视 HPC 集群上的作业状态。我用 sh + awk (而不是我通常选择的工具 perl)编写它。
该脚本有效,但我花了很长时间才弄清楚如何预先填充可能状态的关联数组(一些/大多数可能不存在于 sacct 输出中,我希望它们默认为零)。谷歌并没有提供太多帮助,我能想到的最好的办法就是在字符串上使用 split 来生成一个临时数组,然后我对其进行迭代。
我想出了这个:
BEGIN {
num = split("cancelled completed completing failed nodefail pending running suspended timeout",statenames," ");
for (i=1;i<=num;i++) {
states[statenames[i]] = 0
}
}
这可行,但与我在 perl 中的做法相比似乎很笨拙,如下所示:
foreach (qw(cancelled completed completing failed nodefail pending running suspended timeout)) {
$states{$_} = 0;
}
或者
%states = map {$_ => 0} qw(cancelled completed completing failed nodefail pending running suspended timeout);
我的问题是:是否有一种在 awk 中执行此操作的方法与任一 perl 版本类似?
[编辑]
为了澄清,这是我通过管道传输到 awk 的 sacct 输出的示例。请注意,此输出中的唯一状态是 RUNNING、COMPLETED 和 CANCELED - 其他状态不存在(因为它们今天没有发生),但我希望它们出现在我的脚本的输出中(以 munin 可用的形式作为“状态名称.值 0")。
# sacct -X -P -o 'state' -n
RUNNING
RUNNING
RUNNING
RUNNING
COMPLETED
RUNNING
COMPLETED
RUNNING
COMPLETED
COMPLETED
CANCELLED by 1000
COMPLETED
[再次编辑]
这是我的 munin 插件的示例输出:
# ./slurm-sacct
suspended.value 0
pending.value 0
nodefail.value 0
failed.value 0
running.value 6
completing.value 0
completed.value 5
timeout.value 0
cancelled.value 1
脚本运行并执行我想要的操作,我只是想知道是否有更好的方法来初始化关联数组。
I've written a munin plugin that uses slurm's sacct to monitor job states on a HPC cluster. I've written it in sh + awk (rather than my usual tool of choice, perl).
The script works, but it took me ages to figure out how to pre-populate the associative array of possible states (some/most may not be present in sacct output, and i want them to default to zero). Google wasn't much help, and the best I could come up with was to use split on a string to produce a temporary array, which I then iterated over.
I came up with this:
BEGIN {
num = split("cancelled completed completing failed nodefail pending running suspended timeout",statenames," ");
for (i=1;i<=num;i++) {
states[statenames[i]] = 0
}
}
This works, but seems clumsy compared to how i'd do it in perl, like this:
foreach (qw(cancelled completed completing failed nodefail pending running suspended timeout)) {
$states{$_} = 0;
}
or this
%states = map {$_ => 0} qw(cancelled completed completing failed nodefail pending running suspended timeout);
my question is: is there a way of doing this in awk that is similar to either of the perl versions?
[ edited ]
to clarify, here's a sample of the sacct output i'm piping into awk. Note that the only states in this output are RUNNING, COMPLETED, and CANCELLED - the others don't exist (because they haven't occurred today), but i want them in my script's output anyway (in a form usable by munin as "statename.value 0").
# sacct -X -P -o 'state' -n
RUNNING
RUNNING
RUNNING
RUNNING
COMPLETED
RUNNING
COMPLETED
RUNNING
COMPLETED
COMPLETED
CANCELLED by 1000
COMPLETED
[ edited again ]
and here's sample output from my munin plugin:
# ./slurm-sacct
suspended.value 0
pending.value 0
nodefail.value 0
failed.value 0
running.value 6
completing.value 0
completed.value 5
timeout.value 0
cancelled.value 1
The script runs and does what I want, I just wanted to know if there was a better way to initialise the associative array.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可能根本不需要这样做。 awk 中的变量是动态的,这意味着它们在第一次使用时会自动初始化(分配给或访问),这也适用于数组元素。
如果在数字上下文中访问变量,则该变量将被初始化为 0,否则将被初始化为空字符串。 (至少 gawk 做到了这一点,尽管我不确定它是否依赖于实现)因此,如果您正在执行诸如计算每个状态下的作业数量之类的操作,则整个程序就像
每次执行表达式
states[$1]++
时,它会检查states[$1]
是否存在,如果不存在则将其初始化为 0。编辑:根据您的评论,我猜您想为每个可能的状态打印一行,无论该状态是否有任何工作。在这种情况下,您需要包含所有可能的状态名称,并且没有 Perl 中那样的快捷表示法来执行此操作。据我所知,您已经发现的内容已经非常干净了。 (Awk 的设计并没有真正考虑到这种用法)
我建议如下:
You probably don't need to do it at all. Variables in awk are dynamic, which means they're automatically initialized when they are first used (either assigned to or accessed), and this applies to array elements as well.
A variable will be initialized to 0 if it's accessed in a numeric context, or to the empty string otherwise. (At least gawk does this, though I'm not sure if it's implementation-dependent) So if you're doing something like counting the number of jobs that are in each state, the entire program is as simple as something like
Each time the expression
states[$1]++
is executed, it will check for the existence ofstates[$1]
and initialize it to 0 if it doesn't already exist.EDIT: From your comment I'm guessing you want to print out a line for each possible state, regardless of whether there are any jobs in that state or not. In that case, you need to include all the possible state names, and there is no shortcut notation for doing so as there is in Perl. As far as I know, what you've already found is about as clean as it gets. (Awk is not really designed with that usage in mind)
I'd suggest the following:
代替
也许 Craig 可以使用 : this:
:在我的例子中,如果 awk 输入中没有超时状态,则第一个打印将给出:
而第二个将给出:
Perhaps Craig can use instead of :
this:
In my case if there is no timeout state in awk input, the first print will give:
While the second will give:
我认为 awk 中更自然的方法是拥有一个单独的密钥文件。考虑一个文件
keys.txt
,每行一个键。然后,您可以执行如下操作:对于
keys.txt
中的五个键,将产生:虽然此处按顺序显示键,但这只是偶然的,不应依赖。
对于特定示例,您还可以完全跳过关联数组。相反,您可以使用 awk 最少地处理这些行并使用
sort | uniq -c
将计数制成表格。可以使用针对密钥文件的join
来确保所有密钥的存在。I think a more natural approach in awk would be to have a separate file of keys. Consider a file
keys.txt
with one key per line. You could then do something like this:With five keys in
keys.txt
, this produces:Although the keys are shown in order here, that's just incidental and shouldn't be relied upon.
For the specific example, you could also skip the associative array altogether. Instead, you could minimally process the lines with awk and use
sort | uniq -c
to tabulate the counts. The presence of all keys could be ensured usingjoin
against a file of keys.awk 比 Perl 有点笨拙(我想说“不太简洁”)。
你可以这样写(类似于@Michael的答案):
awk is somewhat clumsier (I would say "less terse") than Perl.
You could write this (similar to @Michael's answer):
对 @DavidZaslavsky 答案的一项调整可能是按照您在 split() 行上指定的顺序打印状态。那将是:
我还将输入转换为小写,以便它与您的硬编码值匹配,摆脱了 split() 不必要的第三个参数以及后续的 null 语句(尾随分号)。
如果您想在输入中查找不在硬编码集中的州名称,您可以将其调整为:
One tweak to @DavidZaslavsky's answer might be to print the states in the order you specified them on the split() line. That would be:
I also converted the input to lower case so it matches your hard-coded values, got rid of the unnecessary 3rd arg to split() and the subsequent null statement (trailing semi-colon).
In case you want to account for finding state names in your input that weren't in your hard-coded set, you could tweak it to: