awk 列的中位数
如何使用 AWK 计算一列数值数据的中位数?
我可以想到一个简单的算法,但我似乎无法对其进行编程:
到目前为止我所拥有的是:
sort | awk 'END{print NR}'
这给了我列中元素的数量。我想用它来打印某一行(NR/2)
。如果 NR/2 不是整数,则四舍五入到最接近的整数,即中位数,否则取 (NR/2)+1 的平均值和(NR/2)-1
。
How can I use AWK to compute the median of a column of numerical data?
I can think of a simple algorithm but I can't seem to program it:
What I have so far is:
sort | awk 'END{print NR}'
And this gives me the number of elements in the column. I'd like to use this to print a certain row (NR/2)
. If NR/2
is not an integer, then I round up to the nearest integer and that is the median, otherwise I take the average of (NR/2)+1
and (NR/2)-1
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用awk,您必须将值存储在数组中并计算最后的中位数,假设我们查看第一列:
当然,对于真正的中位数计算,请按照问题中所述进行舍入:
With
awk
you have to store the values in an array and compute the median at the end, assuming we look at the first column:Sure, for real median computation do the rounding as described in the question:
此 awk 程序假定一列按数字排序的数据:
示例用法:
This
awk
program assumes one column of numerically sorted data:Sample usage:
好的,刚刚看到这个主题,我想我可以添加我的两分钱,因为我过去寻找过类似的东西。尽管标题是
awk
,但所有答案都使用了sort
。使用 datamash 可以轻松计算一列数据的中位数:请注意
即使您有未排序的列,也不需要 sort:
文档提供了它可以执行的所有功能,以及具有多列的文件的良好示例。无论如何,它与 awk 无关,但我认为 datamash 在这种情况下很有帮助,也可以与 awk 结合使用/代码>。希望它对某人有帮助!
OK, just saw this topic and thought I could add my two cents, since I looked for something similar in the past. Even though the title says
awk
, all the answers make use ofsort
as well. Calculating the median for a column of data can be easily accomplished with datamash:Note that
sort
is not needed, even if you have an unsorted column:The documentation gives all the functions it can perform, and good examples as well for files with many columns. Anyway, it has nothing to do with
awk
, but I thinkdatamash
is of great help in cases like this, and could also be used in conjunction withawk
. Hope it helps somebody!这个基于 AWK 的答案对 unix.stackexchange.com 上的类似问题给出了与 Excel 计算中位数相同的结果。
This AWK based answer to a similar question on unix.stackexchange.com gives the same results as Excel for calculating the median.
如果您有一个数组来计算中位数(包含 Johnsyweb 解决方案的一行):
If you have an array to compute median from (contains one-liner of Johnsyweb solution):