使用并行 AWK - 有人听说过吗?
有这样的事吗?有人可以解释一下吗?我一直在使用 AWK 执行简单的任务,例如打印列和合并大数据文件,但不用于计算?我在想是否可以使用我的计算机或网络中的所有节点和 CPU 并行运行 AWK。但如何呢?使用并行 AWK 的主要目的是什么?
感谢您的意见。
发布问题后,我发现 Parallel AWK 确实存在。您可以找到更多相关信息。这是链接 http://www.parallel-awk.org/
Is there such a thing? Can anyone kindly elucidate on this? I have been using AWK to perform simple tasks such as printing columns and merging large data file, but not for calculations? I was thinking if one can run AWK parallel using all the nodes and CPUs in my computer or in the network. But how? What is the primary aim using parallel AWK?
Thank you for your input.
After having posted the question, I found out Parallel AWK does exist. You can find more about it. Here is the link http://www.parallel-awk.org/
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
并行 awk 实现的问题在于语义明确假设操作是按顺序处理的。例如:
为您提供类似于
cat -n
的输出。并行处理的困难在于 NR 是处理的总行数,而不仅仅是给定文件中的行数 (FNR
)此外,还有涉及 getline 等命令的更复杂的技巧,无法并行化(例如,可以短路脚本来模拟 gnu
nextfile
扩展)The problem with a parallel awk implementation is that the semantics explicitly assume that operations are processed in order. For example:
gives you output akin to
cat -n
. The difficulty with processing this in parallel is that NR is the total number of lines processed, not just the number of lines in the given file (FNR
)Also, there are more complicated tricks involving commands like getline, which cannot be parallelized (for example, a script can be short-circuited to emulate the gnu
nextfile
extension)