使用子 shell 进行 diff 参数替换

发布于 2024-12-17 19:04:34 字数 719 浏览 0 评论 0原文

我正在编写一个 shell 脚本，为了使其更短且更易于阅读，我尝试使用嵌套子 shell 将参数传递给 diff。

这就是我所拥有的：

if
  diff -iy '$(sort '$(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv)' '$(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv)')' > /dev/null;
then  
  echo There is no difference between the files. > ./participants-by-state-results.txt;
else  
  diff -iy '$(sort '$(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv)' '$(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv)')' > ./participants-by-state-results.txt;
fi

当我运行脚本时，我不断收到 diff: extra operand 'AL'

我将不胜感激任何有关失败原因的见解。我想我已经很接近了。谢谢！

原文

I'm writing a shell script, and in an effort to make it shorter and easier to read, I'm trying to use nested subshells to pass parameters to diff.

Here's what I have:

if
  diff -iy '$(sort '$(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv)' '$(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv)')' > /dev/null;
then  
  echo There is no difference between the files. > ./participants-by-state-results.txt;
else  
  diff -iy '$(sort '$(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv)' '$(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv)')' > ./participants-by-state-results.txt;
fi

When I run the script, I keep getting diff: extra operand 'AL'

I'd appreciate any insight into why this is failing. I think I'm pretty close. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

贪恋 2024-12-24 19:04:34

您的代码不可读，因为行太长：

if diff -iy '$(sort '$(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv)' \
       '$(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv)')' \
       > /dev/null;
then  
    echo There is no difference between the files. > ./participants-by-state-results.txt;
else  
   diff -iy '$(sort '$(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv)' \
      '$(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv)')' \
      > ./participants-by-state-results.txt;
fi

重复这样的整个命令也相当令人讨厌。您在使用单引号时也遇到了重大问题；每组命令中只有一种排序，显然是对两个相同的 awk 命令的组合输出进行操作（而您可能需要两种单独的排序，一种用于每个 awk 的输出代码>命令）；当可以时，您没有使用 awk 的 -F 选项；你到处重复着巨大的文件名；最后，看来您可能想要使用进程替换，但实际上并没有这样做。

让我们退后一步，清楚地阐述这个问题。

给定两个文件（new-participants-by-state.csv 和 current-participants-by-state.csv），找到每个文件的每一行上的第一个管道分隔字段文件，对这些字段的列表进行排序，并比较两个排序列表的结果。
如果没有差异，则将消息写入输出文件participants-by-state-results.txt；否则，列出输出文件中的差异。

因此，我们可以使用：

oldfile='current-participants-by-state.csv'
newfile='new-participants-by-state.csv'
outfile='participants-by-state-results.txt'

tmpfile=${TMPDIR:-/tmp}/xx.$

awk -F'|' '{print $1}' $oldfile | sort > $tmpfile.1
awk -F'|' '{print $1}' $newfile | sort > $tmpfile.2

if diff -iy $tmpfile.1 $tmpfile.2 > $outfile
then echo "There is no difference between the files" > $outfile
fi

rm -f $tmpfile.?

如果这将是最终脚本，我们希望将陷阱处理到位，以便临时文件不会留下，除非脚本被 SIGKILL 杀死。

但是，我们现在可以使用进程替换来避免临时文件：

oldfile='current-participants-by-state.csv'
newfile='new-participants-by-state.csv'
outfile='participants-by-state-results.txt'

if diff -iy <(awk -F'|' '{print $1}' $oldfile | sort) \
            <(awk -F'|' '{print $1}' $newfile | sort) > $outfile
then echo "There is no difference between the files" > $outfile
fi

请注意代码如何在存在对称性的地方小心地保留对称性。请注意使用较短的变量名称以避免长文件名的重复。请注意，diff 命令仅运行一次，而不是两次 - 丢弃稍后需要的结果并不是很明智。

您可以使用以下方法进一步压缩输出 I/O 重定向：

{
if diff -iy <(awk -F'|' '{print $1}' $oldfile | sort) \
            <(awk -F'|' '{print $1}' $newfile | sort)
then echo "There is no difference between the files"
fi
} > $outfile

将所包含命令的标准输出发送到文件。

当然，如果文件是用竖线分隔而不是逗号分隔，则 CSV 可能不是合适的命名法，但这完全是另一回事。

我还假设 diff -iy 的状态按照原始脚本的建议工作；我尚未验证 diff 命令的用法。

Your code is unreadable because the lines are so long:

if diff -iy '$(sort '$(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv)' \
       '$(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv)')' \
       > /dev/null;
then  
    echo There is no difference between the files. > ./participants-by-state-results.txt;
else  
   diff -iy '$(sort '$(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv)' \
      '$(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv)')' \
      > ./participants-by-state-results.txt;
fi

Repeating whole commands like that is also fairly nasty. You also have major problems with your use of single quotes; you only have one sort in each set of commands, apparently operating on the combined outputs of two identical awk commands (whereas you probably need two separate sorts, one for the output of each awk command); you're not using the -F option to awk when you could; you are repeating the gargantuan file names all over the place; and finally, it appears that you are probably wanting to use process substitution, but not actually doing so.

Let's take a step back and formulate the question clearly.

Given two files (new-participants-by-state.csv and current-participants-by-state.csv) find the first pipe-separated field on each line of each file, sort the lists of those fields, and compare the results of the two sorted lists.
If there are no differences, write a message into the output file participants-by-state-results.txt; otherwise, list the differences in the output file.

So, we could use:

oldfile='current-participants-by-state.csv'
newfile='new-participants-by-state.csv'
outfile='participants-by-state-results.txt'

tmpfile=${TMPDIR:-/tmp}/xx.$

awk -F'|' '{print $1}' $oldfile | sort > $tmpfile.1
awk -F'|' '{print $1}' $newfile | sort > $tmpfile.2

if diff -iy $tmpfile.1 $tmpfile.2 > $outfile
then echo "There is no difference between the files" > $outfile
fi

rm -f $tmpfile.?

If this was going to be the final script, we'd want to put trap handling in place so that the temporary files are not left around unless the script is killed dead with SIGKILL.

However, we can now use process substitution to avoid the temporary files:

oldfile='current-participants-by-state.csv'
newfile='new-participants-by-state.csv'
outfile='participants-by-state-results.txt'

if diff -iy <(awk -F'|' '{print $1}' $oldfile | sort) \
            <(awk -F'|' '{print $1}' $newfile | sort) > $outfile
then echo "There is no difference between the files" > $outfile
fi

Note how the code carefully preserves symmetries where there are symmetries. Note the use of shortish variable names to avoid the repetition of long file names. Note that the diff command is run just once, not twice - throwing away results which are needed later is not very sensible.

You could compress the output I/O redirection even more using:

{
if diff -iy <(awk -F'|' '{print $1}' $oldfile | sort) \
            <(awk -F'|' '{print $1}' $newfile | sort)
then echo "There is no difference between the files"
fi
} > $outfile

That sends the standard output of the enclosed commands to the file.

Of course, CSV might not be the appropriate nomenclature if the files are pipe-separated rather than comma-separated, but that's another matter altogether.

I'm also assuming that the status from diff -iy works as suggested by the original script; I've not validated that usage of the diff command.

回复收藏 0 原文

三生殊途 2024-12-24 19:04:34

这里有几个问题。

首先，您将各种参数放在单引号中，这会阻止对它们进行任何解释（例如， $(....) 在单引号内不会执行任何特殊操作）。您可能正在考虑双引号，但这也不是您想要的。

这给我们带来了第二个问题，diff 和 sort 期望以文件名作为参数，并且它们对这些文件中的数据进行操作；您试图直接将数据作为参数传递，这是行不通的（我怀疑这就是您收到的错误的根源： diff 需要两个文件名，您传递了两个以上的参与者名称，并且 AL恰好是列表中的第三个，因此是 diff 恐慌的那个）。通常的方法是使用中间文件（以及脚本中的多行），但 bash 实际上有一种方法可以在不使用这些文件的情况下执行此操作：进程替换。本质上，它的作用是运行一个命令，并将输出（或输入，但在这种情况下我们需要输出）发送到命名管道；然后它将管道的名称作为参数传递给另一个命令。例如，diff <(command1) <(command2) 将给出 command1 和 command2 的输出之间的差异。请注意，由于这是仅限 bash 的功能，因此您必须使用 #!/bin/bash 启动脚本，而不是 #!/bin/sh。

第三，缺少一个右括号，这使得很难判断应该发生什么。比较之前应该对两个文件进行排序，还是仅对新参与者文件进行排序？

第四，由于最终比较忽略大小写（-i），因此最好也使用不区分大小写的排序（-f）。

最后，如果有任何差异，您将执行两次所有处理。我建议在文件中运行一次比较，然后如果没有差异，则忽略/覆盖（空）文件。

哦，还有一个风格上的问题：在 bash 中，你不需要在行尾使用分号。仅当您将多个命令放在同一行时（以及其他一些情况，如 if 语句中的 then 之前），才需要分号。

无论如何，这是我的重写：

#!/bin/bash
if
    diff -iy <(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv | sort -f) <(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv | sort -f) >./participants-by-state-results.txt
then
    echo "There is no difference between the files." > ./participants-by-state-results.txt
fi

There are several problems here.

First, you're putting various arguments in single-quotes, which prevents any interpretation being done on them (for example, $(....) doesn't do anything special inside single-quotes). You're probably thinking of double-quotes, but those aren't what you want either.

Which brings us to the second problem, that diff and sort expect to be given filenames as arguments, and they operate on the data in those files; you're trying to pass the data directly as arguments, which doesn't work (and I suspect that's the origin of the error you're getting: diff expects exactly two filenames, you're passing more than two participant names, and AL happened to be third on the list and hence the one that diff panicked on). The usual way to do this is to use intermediate files (and multiple lines in the script), but bash actually has a way of doing this without either of those: process substitution. Essentially, what it does is run one command with output (or input, but we need output in this case) sent to a named pipe; then it passes the name of the pipe as an argument to another command. For example, diff <(command1) <(command2) will give you the differences between the outputs of command1 and command2. Note that since this is a bash-only feature, you must start the script with #!/bin/bash, not #!/bin/sh.

Third, there's a missing close-parenthesis that makes it a little hard to tell what's supposed to happen. Are both files supposed to be sorted before the comparison, or only the new-participants file?

Fourth, since the final comparison ignores case (-i), you'd better use a case-insensitive sort (-f) as well.

Finally, you're doing all of the processing twice if there are any differences. I'd recommend running the comparison once into a file, then if there were no differences just ignore/overwrite the (empty) file.

Oh, and just a stylistic thing: you don't need semicolons at the end of lines in bash. You only need semicolons if you're putting more than one command on the same line (and a few other cases like before then in an if statement).

Anyway, here's my rewrite:

#!/bin/bash
if
    diff -iy <(awk 'BEGIN { FS = "|" } ; {print $1}' new-participants-by-state.csv | sort -f) <(awk 'BEGIN { FS = "|" } ; {print $1}' current-participants-by-state.csv | sort -f) >./participants-by-state-results.txt
then
    echo "There is no difference between the files." > ./participants-by-state-results.txt
fi

回复收藏 0 原文

~没有更多了~