像 SAS 一样加入 bash

发布于 2024-12-26 04:14:01 字数 946 浏览 0 评论 0原文

我想使用公共列在 bash 中加入两个文件。我想保留两个文件中所有可配对和不可配对的行。不幸的是，使用 join 我只能从一个文件中保存不可配对的字段，例如。 加入 -1 1 -2 2 -a1 -t" "。
我还想保留两个文件中重复条目（在连接列中）的所有配对。即如果 file1 是
x id1 ab
x id1 光盘
x id1 df
x id2 cx
x id3 fv

第二个文件是

id1 df cf
id1 ds dg
id2 cv df
id2 as ds
id3 cf cg

结果文件应该是：< br>

x id1 ab df cf
x id1 ab ds dg
x id1 cd df cf
x id1 cd ds dg
x id1 df df cf
x id1 df ds dg
x id2 cx cv df
x id2 cx as ds
x id3 fv cf cg

这就是为什么我一直使用SAS

数据x;
合并文件1文件2；
通过 common_column;
run;

它工作正常，但是
1. 因为我大部分时间都使用 Ubuntu，所以我必须切换到 Windows 来合并 SAS 中的数据。
2.最重要的是，SAS 可以截断太长的数据条目。

这就是为什么我更愿意在 bash 中加入我的文件，但我不知道合适的命令。
有人可以帮助我，或指导我找到适当的资源吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

那小子欠揍 2025-01-02 04:14:01

根据 join 的手册页，-a 保留文件中所有不可配对的行（1 或 2）。因此，只需将 -a1 -a2 添加到命令行即可完成。例如：

# cat a
1 blah
2 foo

# cat b
2 bar
3 baz

# join -1 1 -2 1 -t" " a b
2 foo bar

# join -1 1 -2 1 -t" " -a1 a b
1 blah
2 foo bar

# join -1 1 -2 1 -t" " -a2 a b
2 foo bar
3 baz

# join -1 1 -2 1 -t" " -a1 -a2 a b
1 blah
2 foo bar
3 baz

这是您要找的吗？

编辑：

由于您提供了更多详细信息，因此以下是如何生成所需的输出（请注意，我的文件 a 是您的第一个文件，我的文件 b 你的第二个文件。我必须将 -1 1 -2 2 反转为 -1 2 -2 1 才能加入 id）。我还添加了一个字段列表来格式化输出 - 请注意，“0”是其中的连接字段：

# join -1 2 -2 1 -o 1.1,0,1.3,1.4,2.2,2.3 a b

生成您给出的内容。添加 -a1 -a2 以保留两个文件中不可配对的行，然后您会得到另外两行（您可以从中猜测我的测试数据）：

x id4 u t
 id5   ui oi

这是相当不可读的，因为任何遗漏的字段只是一个空格。因此，让我们将它们替换为“-”，结果是：

# join -1 2 -2 1 -a1 -a2 -e- -o 1.1,0,1.3,1.4,2.2,2.3 a b
x id1 a b df cf
x id1 a b ds dg
x id1 c d df cf
x id1 c d ds dg
x id1 d f df cf
x id1 d f ds dg
x id2 c x cv df
x id2 c x as ds
x id3 f v cf cg
x id4 u t - -
- id5 - - ui oi

According to join's man page, -a <filenum> retains all unpairable lines from file <filenum> (1 or 2). So, just add -a1 -a2 to your command line and you should be done. For example:

# cat a
1 blah
2 foo

# cat b
2 bar
3 baz

# join -1 1 -2 1 -t" " a b
2 foo bar

# join -1 1 -2 1 -t" " -a1 a b
1 blah
2 foo bar

# join -1 1 -2 1 -t" " -a2 a b
2 foo bar
3 baz

# join -1 1 -2 1 -t" " -a1 -a2 a b
1 blah
2 foo bar
3 baz

Is this what you were looking for?

Edit:

Since you provided more detail, here is how to produce your desired output (note that my file a is your first file and my file b your second file. I had to reverse -1 1 -2 2 to -1 2 -2 1 to join on the id). I added a field list to format the output as well - note that '0' is the join field in it:

# join -1 2 -2 1 -o 1.1,0,1.3,1.4,2.2,2.3 a b

produces what you've given. Add -a1 -a2 to retain unpairable lines from both files you then get two more lines (you can guess my test data from them):

x id4 u t
 id5   ui oi

Which is rather unreadable since any left out field is just a space. So let's replace them with a '-', leading to:

# join -1 2 -2 1 -a1 -a2 -e- -o 1.1,0,1.3,1.4,2.2,2.3 a b
x id1 a b df cf
x id1 a b ds dg
x id1 c d df cf
x id1 c d ds dg
x id1 d f df cf
x id1 d f ds dg
x id2 c x cv df
x id2 c x as ds
x id3 f v cf cg
x id4 u t - -
- id5 - - ui oi

回复收藏 0 原文

岁吢 2025-01-02 04:14:01

如果 join 命令不够强大，如果我需要执行，我通常使用 sqlite shell中的此类操作。

您可以轻松地将平面文件导入到表中，然后使用正确的 JOIN 执行 SQL SELECT 。

请注意，使用 sqlite，您可以利用索引来使连接更快。

sqlite3 << EOF!
CREATE TABLE my table1 (.... -- define your table here
CREATE TABLE my table2 (.... -- define your table here
.separator "," -- define input field separator here if needed
.import input_file.txt mytable1
.import input_file.txt mytable2
SELECT ... JOIN ...
EOF!

sqlite 是免费的、多平台的。非常方便。

If join command is not powerful enough I usually use sqlite if I need to perform such operations in shell.

You can easily import flat files to tables, then do SQL SELECT with proper JOIN.

Note, that with sqlite, you can utilize index to make the join even faster.

sqlite3 << EOF!
CREATE TABLE my table1 (.... -- define your table here
CREATE TABLE my table2 (.... -- define your table here
.separator "," -- define input field separator here if needed
.import input_file.txt mytable1
.import input_file.txt mytable2
SELECT ... JOIN ...
EOF!

sqlite is free and mutiplatform. Very handy.

回复收藏 0 原文

~没有更多了~