我该如何尴尬地做到这一点?我有几个文件,其中有两个列,在第一列中具有相同的值。如何按第二列中的平均值?
我是一个没有经验的尴尬用户,但知道AWK是处理许多文件的有效选择。如果有人请我指向正确的方向,我会很感激。
我有一个称为parent
的目录。其中是更多名为1、2、3、4,...
的目录。这些目录中的每个目录都称为Angles
。内部angles
是一个名为Angle_a_b_c.dat
的文件,如下所示。
parent
1
angles
angle_A_B_C.dat
2
angles
angle_A_B_C.dat
3
angles
angle_A_B_C.dat
4
angles
angle_A_B_C.dat
...
文件Angle_A_B_C.DAT
都具有相同数量的行(91)和相同的第一列。仅第二列中的值是不同的。 angle_a_b_c.dat的示例
# Deg[°] Angle[A ,B ,C ]
1.000 0.0000000000
3.000 0.0000000000
5.000 0.0000000000
7.000 0.0000000000
9.000 0.0000000000
11.000 0.0000000000
13.000 0.0000000000
15.000 0.0000000000
17.000 0.0000000000
19.000 0.0000000000
21.000 0.0000000000
23.000 0.0000000000
25.000 0.0000000000
27.000 0.0000000000
29.000 0.0000000000
31.000 0.0000000000
33.000 0.0000000000
35.000 0.0000000000
37.000 0.0000000000
39.000 0.0000000000
41.000 0.0000000000
43.000 0.0000000000
45.000 0.0000000000
47.000 0.0000000000
49.000 0.0000000000
51.000 0.0000000000
53.000 0.0000000000
55.000 0.0000000000
57.000 0.0000000000
59.000 0.0000000000
61.000 0.0000000000
63.000 0.0000000000
65.000 0.0000000000
67.000 1.0309278351
69.000 1.0309278351
71.000 2.0618556701
73.000 1.0309278351
75.000 2.0618556701
77.000 0.0000000000
79.000 0.0000000000
81.000 4.1237113402
83.000 2.0618556701
85.000 4.1237113402
87.000 2.0618556701
89.000 2.0618556701
91.000 5.1546391753
93.000 3.0927835052
95.000 1.0309278351
97.000 3.0927835052
99.000 1.0309278351
101.000 2.0618556701
103.000 9.2783505155
105.000 7.2164948454
107.000 4.1237113402
109.000 5.1546391753
111.000 5.1546391753
113.000 3.0927835052
115.000 2.0618556701
117.000 9.2783505155
119.000 0.0000000000
121.000 3.0927835052
123.000 3.0927835052
125.000 2.0618556701
127.000 0.0000000000
129.000 1.0309278351
131.000 1.0309278351
133.000 2.0618556701
135.000 1.0309278351
137.000 0.0000000000
139.000 1.0309278351
141.000 0.0000000000
143.000 0.0000000000
145.000 1.0309278351
147.000 0.0000000000
149.000 0.0000000000
151.000 1.0309278351
153.000 0.0000000000
155.000 0.0000000000
157.000 1.0309278351
159.000 0.0000000000
161.000 0.0000000000
163.000 0.0000000000
165.000 0.0000000000
167.000 0.0000000000
169.000 0.0000000000
171.000 0.0000000000
173.000 0.0000000000
175.000 0.0000000000
177.000 0.0000000000
179.000 0.0000000000
这是一个 和所有angle_a_b_c.dat
文件,第二行的每一行是来自所有其他文件的同一行的平均值。
我粗略地回忆起如何取平均值在不同目录中位于不同文件中的整个列中,但无法弄清楚如何仅处理一行。这可能吗?
这是我目前所在的地方;问号显示了我认为我被困的位置。
cd parent
find . -name angle_A_B_C.dat -exec grep "Angle[A ,B ,C ]" {} + > anglesSummary.txt
my_output="$(awk '{ total += ??? } END { print total/NR }' anglesSummary.txt)"
echo "Average: $my_output" >> anglesSummary.txt
更新(对Markp-fuso评论的响应)
我想要的内容(请参阅第1列值为15.000的行的评论):
# Deg[°] Angle[A ,B ,C ]
1.000 0.0000000000
3.000 0.0000000000
5.000 0.0000000000
7.000 0.0000000000
9.000 0.0000000000
11.000 0.0000000000
13.000 0.0000000000
15.000 1.2222220000 # <--Each row in column 2 is the average of the value in the corresponding row, column 2 in all files. So this particular value (1.222222) is the average of the values in all files where the column 1 value is 15.000.
17.000 0.0000000000
19.000 0.0000000000
21.000 0.0000000000
23.000 0.0000000000
25.000 0.0000000000
27.000 0.0000000000
29.000 0.0000000000
31.000 0.0000000000
33.000 0.0000000000
35.000 0.0000000000
... (truncated)
我目前从代码中获得的是列的平均值2在每个角度_A_B_C.DAT文件中。
如果还不清楚,请随时这样说,我会重写它。谢谢。
I am an inexperienced Awk user but know that Awk is an efficient choice for processing many files. I would be grateful if someone would please point me in the right direction.
I have a directory called parent
. Inside it are more directories named 1, 2, 3, 4, ...
. Inside each of those directories is a directory called angles
. Inside angles
is a file called angle_A_B_C.dat
, as shown below.
parent
1
angles
angle_A_B_C.dat
2
angles
angle_A_B_C.dat
3
angles
angle_A_B_C.dat
4
angles
angle_A_B_C.dat
...
The files angle_A_B_C.dat
all have the same number of rows (91) and an identical first column. Only the values in the second column are distinct. Here is an example of one angle_A_B_C.dat
file:
# Deg[°] Angle[A ,B ,C ]
1.000 0.0000000000
3.000 0.0000000000
5.000 0.0000000000
7.000 0.0000000000
9.000 0.0000000000
11.000 0.0000000000
13.000 0.0000000000
15.000 0.0000000000
17.000 0.0000000000
19.000 0.0000000000
21.000 0.0000000000
23.000 0.0000000000
25.000 0.0000000000
27.000 0.0000000000
29.000 0.0000000000
31.000 0.0000000000
33.000 0.0000000000
35.000 0.0000000000
37.000 0.0000000000
39.000 0.0000000000
41.000 0.0000000000
43.000 0.0000000000
45.000 0.0000000000
47.000 0.0000000000
49.000 0.0000000000
51.000 0.0000000000
53.000 0.0000000000
55.000 0.0000000000
57.000 0.0000000000
59.000 0.0000000000
61.000 0.0000000000
63.000 0.0000000000
65.000 0.0000000000
67.000 1.0309278351
69.000 1.0309278351
71.000 2.0618556701
73.000 1.0309278351
75.000 2.0618556701
77.000 0.0000000000
79.000 0.0000000000
81.000 4.1237113402
83.000 2.0618556701
85.000 4.1237113402
87.000 2.0618556701
89.000 2.0618556701
91.000 5.1546391753
93.000 3.0927835052
95.000 1.0309278351
97.000 3.0927835052
99.000 1.0309278351
101.000 2.0618556701
103.000 9.2783505155
105.000 7.2164948454
107.000 4.1237113402
109.000 5.1546391753
111.000 5.1546391753
113.000 3.0927835052
115.000 2.0618556701
117.000 9.2783505155
119.000 0.0000000000
121.000 3.0927835052
123.000 3.0927835052
125.000 2.0618556701
127.000 0.0000000000
129.000 1.0309278351
131.000 1.0309278351
133.000 2.0618556701
135.000 1.0309278351
137.000 0.0000000000
139.000 1.0309278351
141.000 0.0000000000
143.000 0.0000000000
145.000 1.0309278351
147.000 0.0000000000
149.000 0.0000000000
151.000 1.0309278351
153.000 0.0000000000
155.000 0.0000000000
157.000 1.0309278351
159.000 0.0000000000
161.000 0.0000000000
163.000 0.0000000000
165.000 0.0000000000
167.000 0.0000000000
169.000 0.0000000000
171.000 0.0000000000
173.000 0.0000000000
175.000 0.0000000000
177.000 0.0000000000
179.000 0.0000000000
I want to generate a file called anglesSummary.txt
where the first column is the same as in the example above and all of the angle_A_B_C.dat
files, and where each row of the second column is the average of the same row from all of the other files.
I roughly recall how to take the average of an entire column that's located in distinct files in distinct directories, but can't figure out how to deal with just one row at a time. Is this possible?
Here is where I am at present; the question marks show where I think I'm stuck.
cd parent
find . -name angle_A_B_C.dat -exec grep "Angle[A ,B ,C ]" {} + > anglesSummary.txt
my_output="$(awk '{ total += ??? } END { print total/NR }' anglesSummary.txt)"
echo "Average: $my_output" >> anglesSummary.txt
Update (response to markp-fuso comment)
What I want (please see comment at the row where the column 1 value is 15.000):
# Deg[°] Angle[A ,B ,C ]
1.000 0.0000000000
3.000 0.0000000000
5.000 0.0000000000
7.000 0.0000000000
9.000 0.0000000000
11.000 0.0000000000
13.000 0.0000000000
15.000 1.2222220000 # <--Each row in column 2 is the average of the value in the corresponding row, column 2 in all files. So this particular value (1.222222) is the average of the values in all files where the column 1 value is 15.000.
17.000 0.0000000000
19.000 0.0000000000
21.000 0.0000000000
23.000 0.0000000000
25.000 0.0000000000
27.000 0.0000000000
29.000 0.0000000000
31.000 0.0000000000
33.000 0.0000000000
35.000 0.0000000000
... (truncated)
What I currently get from my code is the average of the average of column 2 in each angle_A_B_C.dat file.
If this is still unclear, please feel free to say so and I will rewrite it. Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
示例输入:
一个
gnu awk
想法:注意:
procinfo ['sarted_in'']
允许将要在#deg [°]
上升顺序中生成的OUPUT(否则可以将输出输出到sort
确保所需的订单)假设输入行已按顺序排序:
<强>注意:
awk
版本中运行(即,不需要gnu awk
)这两个生成:
注释:
printf
格式strings来调整OP的输出格式。Sample input:
One
GNU awk
idea:NOTES:
GNU awk
required forPROCINFO["sorted_in"]
to allow ouput to be generated in# Deg[°]
ascending order (otherwise output could be piped tosort
to insure the desired ordering)Assuming input lines are already in sorted order:
NOTES:
awk
versions (ie, does not requireGNU awk
)Both of these generate:
NOTES:
printf
format strings使用
212 MB
串联输入文件的合成版本进行测试,假设76,000个单个文件
,并以2.23秒
该解决方案旨在通过将中间值存储在无符号整数中的中间值最小化,直至
2^53
而不是作为双重精确的浮点,并使用最昂贵的字符串操作来防止不良的预处理到浮点观点。它还使用一种蛮力方法来规避缺乏内置排序功能的某些
awk
的限制。上升空间 - 输入文件的行可以按任何混乱的顺序,这很好。Tested with a
212 MB
synthetic version of the concatenated input files, assuming there are a bit over76,000 individual files
, and finishes the entire report in2.23 seconds
This solution aims at minimizing rounding errors by storing intermediate values in unsigned integers up to
2^53
instead of as double-precision floats, and using most costly string ops to prevent undesirable pre-conversion to floating point.It also uses a brute-force method to circumvent limitations of certain
awk
s that lack built-in sorting functionality. The upside being - the rows from the input files can be in any chaotic order and it would be fine.