通过连接的第一列连接 3 个文件(是 awk)?
我有三个类似的文件,它们都是这样的:
文件 A
ID1 Value1a
ID2 Value2a
.
.
.
IDN Value2n
我想要这样的输出
输出
ID1 Value1a Value1b Value1c
ID2 Value2a Value2b Value2c
.....
IDN ValueNa ValueNb ValueNc
看第一行,我希望 value1A 是 fileA 中 id1 的值,value1B 是 fileB 中 id1 的值,等等其中每个字段和每一行。我认为它就像一个 sql 连接。我已经尝试了几件事,但没有一个是接近的。
编辑:所有文件具有相同的长度和 ID。
i have three similar files, they are all like this:
File A
ID1 Value1a
ID2 Value2a
.
.
.
IDN Value2n
and i want an output like this
Output
ID1 Value1a Value1b Value1c
ID2 Value2a Value2b Value2c
.....
IDN ValueNa ValueNb ValueNc
Looking to the first line, i want value1A to be the value of id1 in fileA, value1B the value of id1 in fileB, and so on which each field and each line. I thougth it like a sql join. I've tried several things but none of them where even close.
EDIT: All files have the same length and ids.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
尝试join(1):
Give join(1) a try:
join
(丹尼斯的答案)更好,但只是为了好玩,这是我在awk
中想到的:join
(Dennis's answer) is better, but just for kicks, here's what I came up with inawk
:更新:问题已被编辑为所有文件都包含所有密钥,因此接受的答案(
join
)肯定比这个更好。仅当密钥可能不在所有文件中时才考虑使用此选项。如果您不太关心性能,您可以尝试快速而肮脏的方法:
这实际上首先计算出键,然后使用该键从每个文件中获取值,或者
- 如果它不在相关文件中。
如果文件更复杂(如果字段 1 不在行的开头或者后面跟有非空格分隔符),则需要调整 grep 命令,但这应该是合理的首切解决方案。在这种情况下可能使用的
grep
是:其中
X
实际上是 tab 字符,因为这允许零个或多个空格或键前的制表符以及用于终止该键的空格或制表符。如果文件特别大,您可能需要考虑使用
awk
中的关联数组,但是,由于没有指示大小,我会从这个开始,直到您到达以下位置:它运行得太慢了。Update: The question has been edited to state that all files contain all keys, so the accepted answer (
join
) is definitely better than this one. Only consider using this one if it's possible the keys may not be in all files.If you're not too concerned about performance, you could try the quick and dirty:
This actually works out the keys first then gets the values from each file with that key, or
-
if it's not in the relevant file.The
grep
commands will need to be adjusted if the file is more complex (either if field 1 isn't at the start of the line or is followed by a non-space separator) but this should be a reasonable first-cut solution. The likelygrep
to use in that case would be:where
X
is actually the tab character, as this allows for zero-or-more spaces or tabs before the key and a space or tab to terminate the key.If the files are particularly large, you may want to look into using the associative arrays within
awk
but, since there's no indication of the size, I'd start with this one until you get to the point where it's running too slow.只是补充一点,为了使连接正常工作,应该对输入进行排序。
这个 awk 解决方案应该处理任意数量的输入文件。
您还将丢失键的原始顺序(您将需要更多代码来保留它)。
Just to add that in order for join to work the input should be sorted.
This awk solution should handle any number of input files.
You will also loose the original order of the keys (you'll need more code to preserve it).