如何使用 awk 的多个参数和 shebang(即 #!)?
我想使用 --re-interval< 执行 gawk 脚本/code> 使用 shebang。 “天真的”方法
#!/usr/bin/gawk --re-interval -f
... awk script goes here
不起作用,因为 gawk 是用第一个参数 "--re-interval -f"
(未围绕空格分割)调用的,而 gawk 不理解该参数。有解决方法吗?
当然,您可以不直接调用 gawk,而是将其包装到分割第一个参数的 shell 脚本中,或者制作一个 shell 脚本,然后调用 gawk 并将脚本放入另一个文件中,但我想知道是否有某种方法可以做到这在一个文件中。
shebang 行的行为因系统而异 - 至少在 Cygwin 中它不会分割参数通过空格。我只关心如何在这样的系统上做到这一点;该脚本并不意味着可移植。
I'd like to execute an gawk script with --re-interval
using a shebang. The "naive" approach of
#!/usr/bin/gawk --re-interval -f
... awk script goes here
does not work, since gawk is called with the first argument "--re-interval -f"
(not splitted around the whitespace), which it does not understand. Is there a workaround for that?
Of course you can either not call gawk directly but wrap it into a shell script that splits the first argument, or make a shell script that then calls gawk and put the script into another file, but I was wondering if there was some way to do this within one file.
The behaviour of shebang lines differs from system to system - at least in Cygwin it does not split the arguments by whitespaces. I just care about how to do it on a system that behaves like that; the script is not meant to be portable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
shebang 行从未被指定为 POSIX、SUS、LSB 或任何其他规范的一部分。 AFAIK,它甚至没有被正确记录。
关于它的作用有一个粗略的共识:获取
!
和\n
之间的所有内容并exec
它。假设!
和\n
之间的所有内容都是解释器的完整绝对路径。如果它包含空格会发生什么,目前还没有达成共识。值得庆幸的是,1. 和 4. 似乎已经消失,但 3. 却相当普遍,所以你不能指望能够传递多个参数。
由于 POSIX 或 SUS 中也未指定命令的位置,因此您通常通过将可执行文件的名称传递给
env
来使用该单个参数,以便它< /em> 可以确定可执行文件的位置;例如:[显然,这个仍然假设了
env
的特定路径,但是只有极少数系统它位于/bin
中,所以这通常是安全的。env
的位置比gawk
的位置标准化得多,甚至比python
或ruby
之类的位置更标准化。或spidermonkey
。]这意味着您实际上根本无法使用任何参数。根本。
The shebang line has never been specified as part of POSIX, SUS, LSB or any other specification. AFAIK, it hasn't even been properly documented.
There is a rough consensus about what it does: take everything between the
!
and the\n
andexec
it. The assumption is that everything between the!
and the\n
is a full absolute path to the interpreter. There is no consensus about what happens if it contains whitespace.Thankfully, 1. and 4. seem to have died out, but 3. is pretty widespread, so you simply cannot rely on being able to pass more than one argument.
And since the location of commands is also not specified in POSIX or SUS, you generally use up that single argument by passing the executable's name to
env
so that it can determine the executable's location; e.g.:[Obviously, this still assumes a particular path for
env
, but there are only very few systems where it lives in/bin
, so this is generally safe. The location ofenv
is a lot more standardized than the location ofgawk
or even worse something likepython
orruby
orspidermonkey
.]Which means that you cannot actually use any arguments at all.
虽然不完全可移植,但从 coreutils 8.30 和 根据其文档,您将能够使用:
所以给定:
您将得到:
如果您好奇
showargs
是:原始答案此处。
Although not exactly portable, starting with coreutils 8.30 and according to its documentation you will be able to use:
So given:
you will get:
and in case you are curious
showargs
is:Original answer here.
这似乎对我来说对 (g)awk 有用。
请注意,
#!
运行/bin/sh
,因此该脚本首先被解释为 shell 脚本。起初,我只是尝试了
"exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@"
,但 awk 将其视为命令并无条件打印出每一行输入。这就是为什么我放入任意_long_name==0
- 它应该一直失败。您可以用一些乱码字符串替换它。基本上,我正在 awk 中寻找不会对 shell 脚本产生不利影响的错误条件。在 shell 脚本中,
任意_long_name==0
定义了一个名为任意_long_name
的变量,并将其设置为等于=0
。This seems to work for me with (g)awk.
Note the
#!
runs/bin/sh
, so this script is first interpreted as a shell script.At first, I simply tried
"exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@"
, but awk treated that as a command and printed out every line of input unconditionally. That is why I put in thearbitrary_long_name==0
- it's supposed to fail all the time. You could replace it with some gibberish string. Basically, I was looking for a false-condition in awk that would not adversely affect the shell script.In the shell script, the
arbitrary_long_name==0
defines a variable calledarbitrary_long_name
and sets it equal to=0
.我遇到了同样的问题,由于 shebang 中处理空格的方式(至少在 Linux 上),没有明显的解决方案。
但是,您可以在 shebang 中传递多个选项,只要它们是短选项并且可以连接(GNU 方式)。
例如,你不能有
,但你可以有
显然,只有当选项具有简短的等效项并且不带参数时才有效。
I came across the same issue, with no apparent solution because of the way the whitespaces are dealt with in a shebang (at least on Linux).
However, you can pass several options in a shebang, as long as they are short options and they can be concatenated (the GNU way).
For example, you can not have
but you can have
Obviously, that only works when the options have short equivalents and take no arguments.
在 Cygwin 和 Linux 下,shebang 路径之后的所有内容都会作为一个参数解析到程序中。
可以通过在 shebang 中使用另一个
awk
脚本来解决这个问题:这将执行
{system("/usr/bin/gawk --re-interval -f " FILENAME); awk 中的退出}
。这将在系统 shell 中执行
/usr/bin/gawk --re-interval -f path/to/your/script.awk
。Under Cygwin and Linux everything after the path of the shebang gets parsed to the program as one argument.
It's possible to hack around this by using another
awk
script inside the shebang:This will execute
{system("/usr/bin/gawk --re-interval -f " FILENAME); exit}
in awk.And this will execute
/usr/bin/gawk --re-interval -f path/to/your/script.awk
in your systems shell.上面的 shell shebang 技巧比
/usr/bin/env
更可移植。The above shell shebang trick is more portable than
/usr/bin/env
.在 gawk 手册 (http://www.gnu.org/manual/gawk/gawk.html) 中,第 1.14 节末尾指出,从 shebang 行运行 gawk 时应该只使用单个参数。它表示操作系统会将 gawk 路径之后的所有内容视为单个参数。也许还有另一种方法来指定
--re-interval
选项?也许您的脚本可以在 shebang 行中引用您的 shell,将 gawk 作为命令运行,并将脚本文本包含为“此处文档”。In the gawk manual (http://www.gnu.org/manual/gawk/gawk.html), the end of section 1.14 note that you should only use a single argument when running gawk from a shebang line. It says that the OS will treat everything after the path to gawk as a single argument. Perhaps there is another way to specify the
--re-interval
option? Perhaps your script can reference your shell in the shebang line, rungawk
as a command, and include the text of your script as a "here document".为什么不使用 bash 和 gawk 本身来跳过 shebang,读取脚本,并将其作为文件传递给 gawk 的第二个实例 [--with -无论您需要多少参数]?
(-同样的事情自然也可以通过
sed
或tail
来完成,但我认为有某种美感仅取决于bash
和 <代码>gawk本身;)Why not use
bash
andgawk
itself, to skip past shebang, read the script, and pass it as a file to a second instance ofgawk [--with-whatever-number-of-params-you-need]
?(-the same could naturally also be accomplished with e.g.
sed
ortail
, but I think there's some kind of beauty depending only onbash
andgawk
itself;)只是为了好玩:下面有一个非常奇怪的解决方案,它通过文件描述符 3 和 4 重新路由 stdin 和程序。您还可以为脚本创建一个临时文件。
有一件事很烦人:shell 在脚本上进行变量扩展,因此您必须引用每个 $(如脚本第二行中所做的那样),并且可能不止如此。
Just for fun: there is the following quite weird solution that reroutes stdin and the program through file descriptors 3 and 4. You could also create a temporary file for the script.
One thing is annoying about this: the shell does variable expansion on the script, so you have to quote every $ (as done in the second line of the script) and probably more than that.
对于可移植的解决方案,请使用
awk
而不是gawk
,使用 shebang 调用标准 BOURNE shell (/bin/sh
),并调用 < code>awk 直接将程序作为此处文档在命令行上传递,而不是通过 stdin:注意: 没有
-f
参数传递给awk
。这使得stdin
可供awk
读取输入。假设您已经安装了gawk
并位于您的PATH
上,这实现了我认为您尝试对原始示例执行的所有操作(假设您希望文件内容是 awk 脚本)而不是输入,我认为您的 shebang 方法会将其视为输入)。For a portable solution, use
awk
rather thangawk
, invoke the standard BOURNE shell (/bin/sh
) with your shebang, and invokeawk
directly, passing the program on the command line as a here document rather than via stdin:Note: no
-f
argument toawk
. That leavesstdin
available forawk
to read input from. Assuming you havegawk
installed and on yourPATH
, that achieves everything I think you were trying to do with your original example (assuming you wanted the file content to be the awk script and not the input, which I think your shebang approach would have treated it as).