unix:使用第二列合并 2 个文件

发布于 2024-10-16 21:57:51 字数 855 浏览 0 评论 0原文

我想根据第二列的内容合并两个文件。

文件 1:

"4742"  "209220_at"     2.60700394801826
"104"   "209396_s_at"   2.60651442103297
"749"   "202409_at"     2.59424724783704
"4168"  "209875_s_at"   2.58773204877464
"3973"  "1431_at"       2.52832098784342
"1826"  "207201_s_at"   2.41685345240968

文件 2:

"653"   "1431_at"       2.14595534191867
"1109"  "207201_s_at"   2.13777517447307
"353"   "212531_at"     2.12706340284672
"381"   "206535_at"     2.11456707231618
"1846"  "204534_at"     2.10919474441178

最后:

"3973"  "1431_at"       2.52832098784342 "653"   "1431_at"       2.14595534191867
"1826"  "207201_s_at"   2.41685345240968 "1109"  "207201_s_at"   2.13777517447307

我尝试了 commdiff 和一些晦涩的 awk 单行代码,但没有成功。 非常感谢任何帮助。 本

I'd like to merge two files according to the content of their 2nd columns.

File 1:

"4742"  "209220_at"     2.60700394801826
"104"   "209396_s_at"   2.60651442103297
"749"   "202409_at"     2.59424724783704
"4168"  "209875_s_at"   2.58773204877464
"3973"  "1431_at"       2.52832098784342
"1826"  "207201_s_at"   2.41685345240968

File2:

"653"   "1431_at"       2.14595534191867
"1109"  "207201_s_at"   2.13777517447307
"353"   "212531_at"     2.12706340284672
"381"   "206535_at"     2.11456707231618
"1846"  "204534_at"     2.10919474441178

To have in the end:

"3973"  "1431_at"       2.52832098784342 "653"   "1431_at"       2.14595534191867
"1826"  "207201_s_at"   2.41685345240968 "1109"  "207201_s_at"   2.13777517447307

I have tried comm, diff, some obscure awk one-liner without any success.
Any help much appreciated.
Ben

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

红颜悴 2024-10-23 21:57:51

您可以结合使用 sortjoin 命令来完成此操作。最简单的方法是,

join -j2 <(sort -k2 file1) <(sort -k2 file2)

但显示效果与您想要的略有不同。它只显示公共连接字段,然后显示每个文件中的其余字段

"1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
"207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307

如果您需要与显示的格式完全相同的格式,那么您需要告诉 join 以这种方式输出,

join -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)

其中 -o 接受FILENUM.FIELDNUM 说明符列表。

请注意,我使用的 <() 语法不是 POSIX sh,因此如果您需要 POSIX sh 语法,您应该排序到临时文件。

You can do that with a combination of the sort and join commands. The straightforward approach is

join -j2 <(sort -k2 file1) <(sort -k2 file2)

but that displays slightly differently than you're looking for. It just shows the common join field and then the remaining fields from each file

"1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
"207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307

If you need the format exactly as you showed, then you would need to tell join to output in that manner

join -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)

where -o accepts a list of FILENUM.FIELDNUM specifiers.

Note that the <() syntax I'm using isn't POSIX sh, so you should sort to a temporary file if you need POSIX sh syntax.

于我来说 2024-10-23 21:57:51
awk '
  # store the first file, indexed by col2
  NR==FNR {f1[$2] = $0; next}
  # output only if file1 contains file2's col2
  ($2 in f1) {print f1[$2], $0}
' file1 file2
awk '
  # store the first file, indexed by col2
  NR==FNR {f1[$2] = $0; next}
  # output only if file1 contains file2's col2
  ($2 in f1) {print f1[$2], $0}
' file1 file2
深白境迁sunset 2024-10-23 21:57:51

如果文件很小,请用脚本语言(Perl、Python 和 Ruby 都是不错的选择)编写一个程序,将第一个文件读入一个散列,其键是第二列,然后读取第二个文件并使用散列查找来查找可以加入什么(如果有的话)。

如果文件很大,则每个文件交换第一列和第二列,将它们传递给 unix 排序实用程序,然后用脚本语言合并(加上列重新排序)两个排序的文件。

If the files are small, write a program in a scripting language (Perl, Python and Ruby are all good choices) that reads the first into a hash whose keys are the second column, then read through the second file and use hash lookups to fin out what (if anything) can be joined.

If the files are large then for each file swap the first and second columns, pass them through the unix sort utility, and then in a scripting language merge (plus column reorder) the two sorted files.

谁人与我共长歌 2024-10-23 21:57:51
awk 'FNR==NR{a[$2]=$0} NR>FNR && ($2 in a){ print $0,a[$2] } ' file2 file1
awk 'FNR==NR{a[$2]=$0} NR>FNR && ($2 in a){ print $0,a[$2] } ' file2 file1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文