Bash 中的数组交集

发布于 2024-12-11 09:11:59 字数 323 浏览 0 评论 0原文

如何在 Bash 中比较两个数组以找到所有相交的值？

比方说：
array1 包含值 1 和 2
array2 包含值 2 和 3

我应该返回 2 作为结果。

我自己的答案：

for item1 in $array1; do
    for item2 in $array2; do
        if [[ $item1 = $item2 ]]; then
            result=$result" "$item1
        fi
    done
done

我也在寻找替代解决方案。

原文

How do you compare two arrays in Bash to find all intersecting values?

Let's say:
array1 contains values 1 and 2
array2 contains values 2 and 3

I should get back 2 as a result.

My own answer:

for item1 in $array1; do
    for item2 in $array2; do
        if [[ $item1 = $item2 ]]; then
            result=$result" "$item1
        fi
    done
done

I'm looking for alternate solutions as well.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鲸落 2024-12-18 09:11:59

列表 1 的元素作为正则表达式在列表 2 中查找（表示为字符串： ${list2[*]} ）：

list1=( 1 2 3 4   6 7 8 9 10 11 12)
list2=( 1 2 3   5 6   8 9    11 )

l2=" ${list2[*]} "                    # add framing blanks
for item in ${list1[@]}; do
  if [[ $l2 =~ " $item " ]] ; then    # use $item as regexp
    result+=($item)
  fi
done
echo  ${result[@]}

结果为

1 2 3 6 8 9 11

The elements of list 1 are used as regular expression looked up in list2 (expressed as string: ${list2[*]} ):

list1=( 1 2 3 4   6 7 8 9 10 11 12)
list2=( 1 2 3   5 6   8 9    11 )

l2=" ${list2[*]} "                    # add framing blanks
for item in ${list1[@]}; do
  if [[ $l2 =~ " $item " ]] ; then    # use $item as regexp
    result+=($item)
  fi
done
echo  ${result[@]}

The result is

1 2 3 6 8 9 11

回复收藏 0 原文

用心笑 2024-12-18 09:11:59

采用 @Raihan 的答案并使其适用于非文件（尽管创建了 FD）
我知道这有点作弊，但似乎是一个不错的替代方案

副作用是输出数组将按字典顺序排序，希望没关系
（也不知道你有什么类型的数据，所以我只是用数字进行测试，如果你有带有特殊字符的字符串等，可能需要额外的工作）

result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort)  <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))

测试：

$ array1=(1 17 33 99 109)
$ array2=(1 2 17 31 98 109)

result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort)  <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))

$ echo ${result[@]}
1 109 17

ps我确信有一种方法可以获取数组要在没有 for 循环的情况下每行输出一个值，我只是忘记了它（IFS？）

Taking @Raihan's answer and making it work with non-files (though FDs are created)
I know it's a bit of a cheat but seemed like good alternative

Side effect is that the output array will be lexicographically sorted, hope thats okay
(also don't kno what type of data you have, so I just tested with numbers, there may be additional work needed if you have strings with special chars etc)

result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort)  <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))

Testing:

$ array1=(1 17 33 99 109)
$ array2=(1 2 17 31 98 109)

result=($(comm -12 <(for X in "${array1[@]}"; do echo "${X}"; done|sort)  <(for X in "${array2[@]}"; do echo "${X}"; done|sort)))

$ echo ${result[@]}
1 109 17

p.s. I'm sure there was a way to get the array to out one value per line w/o the for loop, I just forget it (IFS?)

回复收藏 0 原文

固执像三岁 2024-12-18 09:11:59

您的答案不起作用，原因有两个：

$array1 只是扩展到 array1 的第一个元素。（至少，在我安装的 Bash 版本中，它是这样工作的。这似乎没有记录在案的行为，因此它可能是一个与版本相关的怪癖。）
将第一个元素添加到 result 后，result 将包含一个空格，因此下一次运行 result=$result" "$item1 将出现严重错误。（它不会附加到result，而是运行由前两项组成的命令，并将环境变量result设置为空字符串。）更正：事实证明，我在这一点上错了：分词不会发生在作业内部。（请参阅下面的评论。）

您想要的是这样的：

result=()
for item1 in "${array1[@]}"; do
    for item2 in "${array2[@]}"; do
        if [[ $item1 = $item2 ]]; then
            result+=("$item1")
        fi
    done
done

Your answer won't work, for two reasons:

$array1 just expands to the first element of array1. (At least, in my installed version of Bash that's how it works. That doesn't seem to be a documented behavior, so it may be a version-dependent quirk.)
After the first element gets added to result, result will then contain a space, so the next run of result=$result" "$item1 will misbehave horribly. (Instead of appending to result, it will run the command consisting of the first two items, with the environment variable result being set to the empty string.) Correction: Turns out, I was wrong about this one: word-splitting doesn't take place inside assignments. (See comments below.)

What you want is this:

result=()
for item1 in "${array1[@]}"; do
    for item2 in "${array2[@]}"; do
        if [[ $item1 = $item2 ]]; then
            result+=("$item1")
        fi
    done
done

回复收藏 0 原文

无边思念无边月 2024-12-18 09:11:59

如果您要查找相交线的是两个文件（而不是数组），则可以使用 comm 命令。

$ comm -12 file1 file2

If it was two files (instead of arrays) you were looking for intersecting lines, you could use the comm command.

$ comm -12 file1 file2

回复收藏 0 原文

你怎么敢 2024-12-18 09:11:59

现在我明白了“数组”的含义，我认为——首先——你应该考虑使用实际的 Bash 数组。它们更加灵活，例如，数组元素可以包含空格，并且您可以避免 * 和 ? 触发文件名扩展的风险。

但如果您更喜欢使用现有的空格分隔字符串方法，那么我同意 RHT 使用 Perl 的建议：（

result=$(perl -e 'my %array2 = map +($_ => 1), split /\s+/, $ARGV[1];
                  print join " ", grep $array2{$_}, split /\s+/, $ARGV[0]
                 ' "$array1" "$array2")

换行符只是为了可读性；如果您愿意，您可以删除它们。）

在上面的 Bash 中命令时，嵌入式 Perl 程序创建一个名为 %array2 的哈希，其中包含第二个数组的元素，然后打印 %array2 中存在的第一个数组的所有元素。

这与您的代码在处理第二个数组中的重复值的方式上略有不同；在您的代码中，如果 array1 包含 x 两次，array2 包含 x 三次，则 result< /code> 将包含 x 六次，而在我的代码中，result 将仅包含 x 两次。我不知道这是否重要，因为我不知道你的具体要求。

Now that I understand what you mean by "array", I think -- first of all -- that you should consider using actual Bash arrays. They're much more flexible, in that (for example) array elements can contain whitespace, and you can avoid the risk that * and ? will trigger filename expansion.

But if you prefer to use your existing approach of whitespace-delimited strings, then I agree with RHT's suggestion to use Perl:

result=$(perl -e 'my %array2 = map +($_ => 1), split /\s+/, $ARGV[1];
                  print join " ", grep $array2{$_}, split /\s+/, $ARGV[0]
                 ' "$array1" "$array2")

(The line-breaks are just for readability; you can get rid of them if you want.)

In the above Bash command, the embedded Perl program creates a hash named %array2 containing the elements of the second array, and then it prints any elements of the first array that exist in %array2.

This will behave slightly differently from your code in how it handles duplicate values in the second array; in your code, if array1 contains x twice and array2 contains x three times, then result will contain x six times, whereas in my code, result will contain x only twice. I don't know if that matters, since I don't know your exact requirements.