如何使用 awk 的多个参数和 shebang(即 #!)?

发布于 2024-10-05 03:50:29 字数 622 浏览 12 评论 0原文

我想使用 --re-interval< 执行 gawk 脚本/code> 使用 shebang。 “天真的”方法

#!/usr/bin/gawk --re-interval -f
... awk script goes here

不起作用,因为 gawk 是用第一个参数 "--re-interval -f" (未围绕空格分割)调用的,而 gawk 不理解该参数。有解决方法吗?

当然,您可以不直接调用 gawk,而是将其包装到分割第一个参数的 shell 脚本中,或者制作一个 shell 脚本,然后调用 gawk 并将脚本放入另一个文件中,但我想知道是否有某种方法可以做到这在一个文件中。

shebang 行的行为因系统而异 - 至少在 Cygwin 中它不会分割参数通过空格。我只关心如何在这样的系统上做到这一点;该脚本并不意味着可移植。

I'd like to execute an gawk script with --re-interval using a shebang. The "naive" approach of

#!/usr/bin/gawk --re-interval -f
... awk script goes here

does not work, since gawk is called with the first argument "--re-interval -f" (not splitted around the whitespace), which it does not understand. Is there a workaround for that?

Of course you can either not call gawk directly but wrap it into a shell script that splits the first argument, or make a shell script that then calls gawk and put the script into another file, but I was wondering if there was some way to do this within one file.

The behaviour of shebang lines differs from system to system - at least in Cygwin it does not split the arguments by whitespaces. I just care about how to do it on a system that behaves like that; the script is not meant to be portable.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

烟沫凡尘 2024-10-12 03:50:29

shebang 行从未被指定为 POSIX、SUS、LSB 或任何其他规范的一部分。 AFAIK,它甚至没有被正确记录。

关于它的作用有一个粗略的共识:获取 !\n 之间的所有内容并 exec 它。假设 !\n 之间的所有内容都是解释器的完整绝对路径。如果它包含空格会发生什么,目前还没有达成共识。

  1. 有些操作系统只是将整个事物视为路径。毕竟,在大多数操作系统中,路径中的空格或破折号是合法的。
  2. 某些操作系统以空格分割,并将第一部分视为解释器的路径,其余部分视为单独的参数。
  3. 某些操作系统在第一个空格处进行分割,并将前面的部分视为解释器的路径,而将其余部分视为单个参数(这就是您所看到的)。
  4. 有些甚至根本不支持 shebang 行。

值得庆幸的是,1. 和 4. 似乎已经消失,但 3. 却相当普遍,所以你不能指望能够传递多个参数。

由于 POSIX 或 SUS 中也未指定命令的位置,因此您通常通过将可执行文件的名称传递给env来使用该单个参数,以便它< /em> 可以确定可执行文件的位置;例如:

#!/usr/bin/env gawk

[显然,这个仍然假设了env的特定路径,但是只有极少数系统它位于/bin中,所以这通常是安全的。 env 的位置比 gawk 的位置标准化得多,甚至比 pythonruby 之类的位置更标准化。或 spidermonkey。]

这意味着您实际上根本无法使用任何参数。根本

The shebang line has never been specified as part of POSIX, SUS, LSB or any other specification. AFAIK, it hasn't even been properly documented.

There is a rough consensus about what it does: take everything between the ! and the \n and exec it. The assumption is that everything between the ! and the \n is a full absolute path to the interpreter. There is no consensus about what happens if it contains whitespace.

  1. Some operating systems simply treat the entire thing as the path. After all, in most operating systems, whitespace or dashes are legal in a path.
  2. Some operating systems split at whitespace and treat the first part as the path to the interpreter and the rest as individual arguments.
  3. Some operating systems split at the first whitespace and treat the front part as the path to the interpeter and the rest as a single argument (which is what you are seeing).
  4. Some even don't support shebang lines at all.

Thankfully, 1. and 4. seem to have died out, but 3. is pretty widespread, so you simply cannot rely on being able to pass more than one argument.

And since the location of commands is also not specified in POSIX or SUS, you generally use up that single argument by passing the executable's name to env so that it can determine the executable's location; e.g.:

#!/usr/bin/env gawk

[Obviously, this still assumes a particular path for env, but there are only very few systems where it lives in /bin, so this is generally safe. The location of env is a lot more standardized than the location of gawk or even worse something like python or ruby or spidermonkey.]

Which means that you cannot actually use any arguments at all.

第几種人 2024-10-12 03:50:29

虽然不完全可移植,但从 coreutils 8.30 和 根据其文档,您将能够使用:

#!/usr/bin/env -S command arg1 arg2 ...

所以给定:

$ cat test.sh
#!/usr/bin/env -S showargs here 'is another' long arg -e "this and that " too

您将得到:

% ./test.sh 
$0 is '/usr/local/bin/showargs'
$1 is 'here'
$2 is 'is another'
$3 is 'long'
$4 is 'arg'
$5 is '-e'
$6 is 'this and that '
$7 is 'too'
$8 is './test.sh'

如果您好奇showargs是:

#!/usr/bin/env sh
echo "\$0 is '$0'"

i=1
for arg in "$@"; do
    echo "\$i is '$arg'"
    i=$((i+1))
done

原始答案此处

Although not exactly portable, starting with coreutils 8.30 and according to its documentation you will be able to use:

#!/usr/bin/env -S command arg1 arg2 ...

So given:

$ cat test.sh
#!/usr/bin/env -S showargs here 'is another' long arg -e "this and that " too

you will get:

% ./test.sh 
$0 is '/usr/local/bin/showargs'
$1 is 'here'
$2 is 'is another'
$3 is 'long'
$4 is 'arg'
$5 is '-e'
$6 is 'this and that '
$7 is 'too'
$8 is './test.sh'

and in case you are curious showargs is:

#!/usr/bin/env sh
echo "\$0 is '$0'"

i=1
for arg in "$@"; do
    echo "\$i is '$arg'"
    i=$((i+1))
done

Original answer here.

π浅易 2024-10-12 03:50:29

这似乎对我来说对 (g)awk 有用。

#!/bin/sh
arbitrary_long_name==0 "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@"


# The real awk program starts here
{ print $0 }

请注意,#! 运行 /bin/sh,因此该脚本首先被解释为 shell 脚本。

起初,我只是尝试了 "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@",但 awk 将其视为命令并无条件打印出每一行输入。这就是为什么我放入 任意_long_name==0 - 它应该一直失败。您可以用一些乱码字符串替换它。基本上,我正在 awk 中寻找不会对 shell 脚本产生不利影响的错误条件。

在 shell 脚本中,任意_long_name==0 定义了一个名为 任意_long_name 的变量,并将其设置为等于 =0

This seems to work for me with (g)awk.

#!/bin/sh
arbitrary_long_name==0 "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@"


# The real awk program starts here
{ print $0 }

Note the #! runs /bin/sh, so this script is first interpreted as a shell script.

At first, I simply tried "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$@", but awk treated that as a command and printed out every line of input unconditionally. That is why I put in the arbitrary_long_name==0 - it's supposed to fail all the time. You could replace it with some gibberish string. Basically, I was looking for a false-condition in awk that would not adversely affect the shell script.

In the shell script, the arbitrary_long_name==0 defines a variable called arbitrary_long_name and sets it equal to =0.

陪你搞怪i 2024-10-12 03:50:29

我遇到了同样的问题,由于 shebang 中处理空格的方式(至少在 Linux 上),没有明显的解决方案。

但是,您可以在 shebang 中传递多个选项,只要它们是短选项并且可以连接(GNU 方式)。

例如,你不能有

#!/usr/bin/foo -i -f

,但你可以有

#!/usr/bin/foo -if

显然,只有当选项具有简短的等效项并且不带参数时才有效。

I came across the same issue, with no apparent solution because of the way the whitespaces are dealt with in a shebang (at least on Linux).

However, you can pass several options in a shebang, as long as they are short options and they can be concatenated (the GNU way).

For example, you can not have

#!/usr/bin/foo -i -f

but you can have

#!/usr/bin/foo -if

Obviously, that only works when the options have short equivalents and take no arguments.

小糖芽 2024-10-12 03:50:29

在 Cygwin 和 Linux 下,shebang 路径之后的所有内容都会作为一个参数解析到程序中。

可以通过在 shebang 中使用另一个 awk 脚本来解决这个问题:

#!/usr/bin/gawk {system("/usr/bin/gawk --re-interval -f " FILENAME); exit}

这将执行 {system("/usr/bin/gawk --re-interval -f " FILENAME); awk 中的退出}
这将在系统 shell 中执行 /usr/bin/gawk --re-interval -f path/to/your/script.awk

Under Cygwin and Linux everything after the path of the shebang gets parsed to the program as one argument.

It's possible to hack around this by using another awk script inside the shebang:

#!/usr/bin/gawk {system("/usr/bin/gawk --re-interval -f " FILENAME); exit}

This will execute {system("/usr/bin/gawk --re-interval -f " FILENAME); exit} in awk.
And this will execute /usr/bin/gawk --re-interval -f path/to/your/script.awk in your systems shell.

满栀 2024-10-12 03:50:29
#!/bin/sh
''':'
exec YourProg -some_options "$0" "$@"
'''

上面的 shell shebang 技巧比 /usr/bin/env 更可移植。

#!/bin/sh
''':'
exec YourProg -some_options "$0" "$@"
'''

The above shell shebang trick is more portable than /usr/bin/env.

清音悠歌 2024-10-12 03:50:29

在 gawk 手册 (http://www.gnu.org/manual/gawk/gawk.html) 中,第 1.14 节末尾指出,从 shebang 行运行 gawk 时应该只使用单个参数。它表示操作系统会将 gawk 路径之后的所有内容视为单个参数。也许还有另一种方法来指定 --re-interval 选项?也许您的脚本可以在 shebang 行中引用您的 shell,将 gawk 作为命令运行,并将脚本文本包含为“此处文档”。

In the gawk manual (http://www.gnu.org/manual/gawk/gawk.html), the end of section 1.14 note that you should only use a single argument when running gawk from a shebang line. It says that the OS will treat everything after the path to gawk as a single argument. Perhaps there is another way to specify the --re-interval option? Perhaps your script can reference your shell in the shebang line, run gawk as a command, and include the text of your script as a "here document".

┾廆蒐ゝ 2024-10-12 03:50:29

为什么不使用 bash 和 gawk 本身来跳过 shebang,读取脚本,并将其作为文件传递给 gawk 的第二个实例 [--with -无论您需要多少参数]?

#!/bin/bash
gawk --re-interval -f <(gawk 'NR>3' $0 )
exit
{
  print "Program body goes here"
  print $1
}

(-同样的事情自然也可以通过 sedtail 来完成,但我认为有某种美感仅取决于 bash 和 <代码>gawk本身;)

Why not use bash and gawk itself, to skip past shebang, read the script, and pass it as a file to a second instance of gawk [--with-whatever-number-of-params-you-need]?

#!/bin/bash
gawk --re-interval -f <(gawk 'NR>3' $0 )
exit
{
  print "Program body goes here"
  print $1
}

(-the same could naturally also be accomplished with e.g. sed or tail, but I think there's some kind of beauty depending only on bash and gawk itself;)

与往事干杯 2024-10-12 03:50:29

只是为了好玩:下面有一个非常奇怪的解决方案,它通过文件描述符 3 和 4 重新路由 stdin 和程序。您还可以为脚本创建一个临时文件。

#!/bin/bash
exec 3>&0
exec <<-EOF 4>&0
BEGIN {print "HALLO"}
{print \$1}
EOF
gawk --re-interval -f <(cat 0>&4) 0>&3

有一件事很烦人:shell 在脚本上进行变量扩展,因此您必须引用每个 $(如脚本第二行中所做的那样),并且可能不止如此。

Just for fun: there is the following quite weird solution that reroutes stdin and the program through file descriptors 3 and 4. You could also create a temporary file for the script.

#!/bin/bash
exec 3>&0
exec <<-EOF 4>&0
BEGIN {print "HALLO"}
{print \$1}
EOF
gawk --re-interval -f <(cat 0>&4) 0>&3

One thing is annoying about this: the shell does variable expansion on the script, so you have to quote every $ (as done in the second line of the script) and probably more than that.

旧瑾黎汐 2024-10-12 03:50:29

对于可移植的解决方案,请使用 awk 而不是 gawk,使用 shebang 调用标准 BOURNE shell (/bin/sh),并调用 < code>awk 直接将程序作为此处文档在命令行上传递,而不是通过 stdin:

#!/bin/sh
gawk --re-interval <<<EOF
PROGRAM HERE
EOF

注意: 没有 -f 参数传递给 awk。这使得 stdin 可供 awk 读取输入。假设您已经安装了 gawk 并位于您的 PATH 上,这实现了我认为您尝试对原始示例执行的所有操作(假设您希望文件内容是 awk 脚本)而不是输入,我认为您的 shebang 方法会将其视为输入)。

For a portable solution, use awk rather than gawk, invoke the standard BOURNE shell (/bin/sh) with your shebang, and invoke awk directly, passing the program on the command line as a here document rather than via stdin:

#!/bin/sh
gawk --re-interval <<<EOF
PROGRAM HERE
EOF

Note: no -f argument to awk. That leaves stdin available for awk to read input from. Assuming you have gawk installed and on your PATH, that achieves everything I think you were trying to do with your original example (assuming you wanted the file content to be the awk script and not the input, which I think your shebang approach would have treated it as).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文