根据第一个字段匹配行并合并第二个字段

发布于 2024-12-18 03:07:48 字数 380 浏览 0 评论 0原文

我想使用 awk、sed 或类似工具合并两个文件中第二个字段的条目。

文件0:

string:data:moredata

文件1:

string:random:moredata

如果文件0中的第一个字段字符串文件1中具有匹配的条目,则打印

$random:$data

选择字段似乎很简单:

$ awk -F':' '{print $2}' filename

需要匹配行并打印匹配列 $2

I would like to combine entries from the second field from two files using awk, sed or similar.

File0:

string:data:moredata

File1:

string:random:moredata

If the first field, string in file0 has a matching entry in file1 then print

$random:$data

Selecting the fields seems trivial:

$ awk -F':' '{print $2}' filename

Need to match rows and print matching column $2

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鲜肉鲜肉永远不皱 2024-12-25 03:07:48

这个怎么样 -

awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2

执行:

[jaypal~/Temp]$ cat file1
string:data:moredata

[jaypal~/Temp]$ cat file2
string:random:moredata

[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:data

在此解决方案中,我们将 file1 的整个记录​​加载到第 1 列索引的数组中。我们在下一个文件中快速检查以查看第 1 列是否存在。如果是则执行打印语句。

否定测试:

[jaypal~/Temp]$ cat file1
string:data:moredata
man:woman:child

[jaypal~/Temp]$ cat file2
man:random:moredata
string:woman:child

[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:woman
woman:data

补充一下,NR 和 FNR 是 awk 的内置变量。 NR 给出行号,并且在循环两个文件时不会重置为 0。相反,FNR 也是一个行号,在第二个文件启动时重置为 0。因此,这允许我们将文件 1 存储到数组中,因为只有当 NR==FNR 时才会执行该操作。一旦此条件变为假,则意味着第二个文件已启动,并且下一个模式操作语句开始执行。

How about this one -

awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2

Execution:

[jaypal~/Temp]$ cat file1
string:data:moredata

[jaypal~/Temp]$ cat file2
string:random:moredata

[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:data

In this solution, we are loading the whole record of file1 in to the array indexed at column 1. We do a quick check in the next file to see if the column 1 is present. If it is then print statement is executed.

Negative Test:

[jaypal~/Temp]$ cat file1
string:data:moredata
man:woman:child

[jaypal~/Temp]$ cat file2
man:random:moredata
string:woman:child

[jaypal~/Temp]$ awk -F":" 'NR==FNR {x[$1] = $0; y[$1] = $2; next} ($1 in x) {print $2":"y[$1]}' file1 file2
random:woman
woman:data

Just to add to the explanation, NR and FNR are awk's in-built variables. NR gives line number and does not get reset to 0 when looped over two files. FNR on the contrary is also a line number that gets reset to 0 when second file starts. Thus this allows us to store the file 1 into the array because that action is only done when NR==FNR. As soon as this condition becomes false, it means the second file has started and next pattern action statement begins to execute.

意犹 2024-12-25 03:07:48

sed 解决方案可能适合您:

# cat file1
string0:data1:moredata
string2:data3:moredata
string4:data5:moredata
string6:data7:moredata
string8:data9:moredata
# cat file2
string0:random1:moredata
string2:random3:moredata
string4:random5:moredata
cat file1 - <<<"EOF" file2 | 
sed '1,/^EOF/{H;d};G;s/^\([^:]*:\)\([^:]*:\).*\1\([^:]*\).*/$\2$\3/p;d'
$random1:$data1
$random3:$data3
$random5:$data5

说明:

使用 EOF 分隔符连接文件。将第一个文件放入保留空间 (HS)。将 HS 附加到第二个文件中的所有行,形成查找表。使用分组和反向引用来替换所需的输出结果。顺便说一句,$random:$data 中的 $ 是故意的吗?

通过仅在查找和 file2 的每一行中保留必要的数据,也可以使该解决方案更加高效。

This sed solution might work for you:

# cat file1
string0:data1:moredata
string2:data3:moredata
string4:data5:moredata
string6:data7:moredata
string8:data9:moredata
# cat file2
string0:random1:moredata
string2:random3:moredata
string4:random5:moredata
cat file1 - <<<"EOF" file2 | 
sed '1,/^EOF/{H;d};G;s/^\([^:]*:\)\([^:]*:\).*\1\([^:]*\).*/$\2$\3/p;d'
$random1:$data1
$random3:$data3
$random5:$data5

Explanation:

Concatenate the files with an EOF delimiter. Slurp the first file into the hold space (HS). Append the HS to all lines in the second file making a lookup table. Use grouping and backreferences to substitute the required output result. BTW were the $'s in $random:$data intended?

This solution might also be made more efficient by only retaining the necessary data in the lookup and each line of file2.

烟沫凡尘 2024-12-25 03:07:48

join - 在公共字段上连接两个文件的行

因此,执行 awk 操作,只打印数据和“key”字段。然后执行类似于以下的连接命令:join -1 1 -2 1 file1 file2 >加入.dat

join - join lines of two files on a common field

So do your awk thing, only print both the data and the "key" field. Then do a join command similar to: join -1 1 -2 1 file1 file2 > joined.dat

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文