awk 列的中位数

发布于 2024-11-10 07:32:47 字数 261 浏览 0 评论 0原文

如何使用 AWK 计算一列数值数据的中位数?

我可以想到一个简单的算法,但我似乎无法对其进行编程:

到目前为止我所拥有的是:

sort | awk 'END{print NR}' 

这给了我列中元素的数量。我想用它来打印某一行(NR/2)。如果 NR/2 不是整数,则四舍五入到最接近的整数,即中位数,否则取 (NR/2)+1 的平均值和(NR/2)-1

How can I use AWK to compute the median of a column of numerical data?

I can think of a simple algorithm but I can't seem to program it:

What I have so far is:

sort | awk 'END{print NR}' 

And this gives me the number of elements in the column. I'd like to use this to print a certain row (NR/2). If NR/2 is not an integer, then I round up to the nearest integer and that is the median, otherwise I take the average of (NR/2)+1 and (NR/2)-1.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

-小熊_ 2024-11-17 07:32:47

使用awk,您必须将值存储在数组中并计算最后的中位数,假设我们查看第一列:

sort -n file | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'

当然,对于真正的中位数计算,请按照问题中所述进行舍入:

sort -n file | awk ' { a[i++]=$1; }
    END { x=int((i+1)/2); if (x < (i+1)/2) print (a[x-1]+a[x])/2; else print a[x-1]; }'

With awk you have to store the values in an array and compute the median at the end, assuming we look at the first column:

sort -n file | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'

Sure, for real median computation do the rounding as described in the question:

sort -n file | awk ' { a[i++]=$1; }
    END { x=int((i+1)/2); if (x < (i+1)/2) print (a[x-1]+a[x])/2; else print a[x-1]; }'
冰雪梦之恋 2024-11-17 07:32:47

此 awk 程序假定一列按数字排序的数据:

#/usr/bin/env awk
{
    count[NR] = $1;
}
END {
    if (NR % 2) {
        print count[(NR + 1) / 2];
    } else {
        print (count[(NR / 2)] + count[(NR / 2) + 1]) / 2.0;
    }
}

示例用法:

sort -n data_file | awk -f median.awk

This awk program assumes one column of numerically sorted data:

#/usr/bin/env awk
{
    count[NR] = $1;
}
END {
    if (NR % 2) {
        print count[(NR + 1) / 2];
    } else {
        print (count[(NR / 2)] + count[(NR / 2) + 1]) / 2.0;
    }
}

Sample usage:

sort -n data_file | awk -f median.awk
冷心人i 2024-11-17 07:32:47

好的,刚刚看到这个主题,我想我可以添加我的两分钱,因为我过去寻找过类似的东西。尽管标题是 awk,但所有答案都使用了 sort。使用 datamash 可以轻松计算一列数据的中位数:

> seq 10 | datamash median 1
5.5

请注意 即使您有未排序的列,也不需要 sort:

> seq 10 | gshuf | datamash median 1
5.5

文档提供了它可以执行的所有功能,以及具有多列的文件的良好示例。无论如何,它与 awk 无关,但我认为 datamash 在这种情况下很有帮助,也可以与 awk 结合使用/代码>。希望它对某人有帮助!

OK, just saw this topic and thought I could add my two cents, since I looked for something similar in the past. Even though the title says awk, all the answers make use of sort as well. Calculating the median for a column of data can be easily accomplished with datamash:

> seq 10 | datamash median 1
5.5

Note that sort is not needed, even if you have an unsorted column:

> seq 10 | gshuf | datamash median 1
5.5

The documentation gives all the functions it can perform, and good examples as well for files with many columns. Anyway, it has nothing to do with awk, but I think datamash is of great help in cases like this, and could also be used in conjunction with awk. Hope it helps somebody!

不如归去 2024-11-17 07:32:47

这个基于 AWK 的答案对 unix.stackexchange.com 上的类似问题给出了与 Excel 计算中位数相同的结果。

This AWK based answer to a similar question on unix.stackexchange.com gives the same results as Excel for calculating the median.

深海蓝天 2024-11-17 07:32:47

如果您有一个数组来计算中位数(包含 Johnsyweb 解决方案的一行):

array=(5 6 4 2 7 9 3 1 8) # numbers 1-9
IFS=
\n'
median=$(awk '{arr[NR]=$1} END {if (NR%2==1) print arr[(NR+1)/2]; else print (arr[NR/2]+arr[NR/2+1])/2}' <<< sort <<< "${array[*]}")
unset IFS

If you have an array to compute median from (contains one-liner of Johnsyweb solution):

array=(5 6 4 2 7 9 3 1 8) # numbers 1-9
IFS=
\n'
median=$(awk '{arr[NR]=$1} END {if (NR%2==1) print arr[(NR+1)/2]; else print (arr[NR/2]+arr[NR/2+1])/2}' <<< sort <<< "${array[*]}")
unset IFS
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文