根据第一个字段匹配行并合并第二个字段

发布于 2024-12-18 03:07:48 字数 380 浏览 0 评论 0原文

我想使用 awk、sed 或类似工具合并两个文件中第二个字段的条目。

文件0：

string:data:moredata

文件1：

string:random:moredata

如果文件0中的第一个字段字符串在文件1中具有匹配的条目，则打印

$random:$data

选择字段似乎很简单：

$ awk -F':' '{print $2}' filename

需要匹配行并打印匹配列 $2

原文

I would like to combine entries from the second field from two files using awk, sed or similar.

File0:

string:data:moredata

File1:

string:random:moredata

If the first field, string in file0 has a matching entry in file1 then print

$random:$data

Selecting the fields seems trivial:

$ awk -F':' '{print $2}' filename

Need to match rows and print matching column $2

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鲜肉鲜肉永远不皱 2024-12-25 03:07:48

这个怎么样 -

awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2

执行：

[jaypal~/Temp]$ cat file1
string:data:moredata

[jaypal~/Temp]$ cat file2
string:random:moredata

[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:data

在此解决方案中，我们将 file1 的整个记录加载到第 1 列索引的数组中。我们在下一个文件中快速检查以查看第 1 列是否存在。如果是则执行打印语句。

否定测试：

[jaypal~/Temp]$ cat file1
string:data:moredata
man:woman:child

[jaypal~/Temp]$ cat file2
man:random:moredata
string:woman:child

[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:woman
woman:data

补充一下，NR 和 FNR 是 awk 的内置变量。 NR 给出行号，并且在循环两个文件时不会重置为 0。相反，FNR 也是一个行号，在第二个文件启动时重置为 0。因此，这允许我们将文件 1 存储到数组中，因为只有当 NR==FNR 时才会执行该操作。一旦此条件变为假，则意味着第二个文件已启动，并且下一个模式操作语句开始执行。

How about this one -

awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2

Execution:

[jaypal~/Temp]$ cat file1
string:data:moredata

[jaypal~/Temp]$ cat file2
string:random:moredata

[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:data

In this solution, we are loading the whole record of file1 in to the array indexed at column 1. We do a quick check in the next file to see if the column 1 is present. If it is then print statement is executed.

Negative Test:

[jaypal~/Temp]$ cat file1
string:data:moredata
man:woman:child

[jaypal~/Temp]$ cat file2
man:random:moredata
string:woman:child

[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:woman
woman:data

Just to add to the explanation, NR and FNR are awk's in-built variables. NR gives line number and does not get reset to 0 when looped over two files. FNR on the contrary is also a line number that gets reset to 0 when second file starts. Thus this allows us to store the file 1 into the array because that action is only done when NR==FNR. As soon as this condition becomes false, it means the second file has started and next pattern action statement begins to execute.

回复收藏 0 原文

意犹 2024-12-25 03:07:48

此 sed 解决方案可能适合您：

# cat file1
string0:data1:moredata
string2:data3:moredata
string4:data5:moredata
string6:data7:moredata
string8:data9:moredata
# cat file2
string0:random1:moredata
string2:random3:moredata
string4:random5:moredata
cat file1 - <<<"EOF" file2 | 
sed '1,/^EOF/{H;d};G;s/^\([^:]*:\)\([^:]*:\).*\1\([^:]*\).*/$\2$\3/p;d'
$random1:$data1
$random3:$data3
$random5:$data5

说明：

使用 EOF 分隔符连接文件。将第一个文件放入保留空间 (HS)。将 HS 附加到第二个文件中的所有行，形成查找表。使用分组和反向引用来替换所需的输出结果。顺便说一句，$random:$data 中的 $ 是故意的吗？

通过仅在查找和 file2 的每一行中保留必要的数据，也可以使该解决方案更加高效。

This sed solution might work for you:

# cat file1
string0:data1:moredata
string2:data3:moredata
string4:data5:moredata
string6:data7:moredata
string8:data9:moredata
# cat file2
string0:random1:moredata
string2:random3:moredata
string4:random5:moredata
cat file1 - <<<"EOF" file2 | 
sed '1,/^EOF/{H;d};G;s/^\([^:]*:\)\([^:]*:\).*\1\([^:]*\).*/$\2$\3/p;d'
$random1:$data1
$random3:$data3
$random5:$data5

Explanation:

Concatenate the files with an EOF delimiter. Slurp the first file into the hold space (HS). Append the HS to all lines in the second file making a lookup table. Use grouping and backreferences to substitute the required output result. BTW were the $'s in $random:$data intended?

This solution might also be made more efficient by only retaining the necessary data in the lookup and each line of file2.

回复收藏 0 原文