使用 AWK 对关联数组进行排序

发布于 2024-10-22 12:21:14 字数 1958 浏览 2 评论 0原文

这是我的数组(gawk 脚本):

myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11

排序后,我需要以下结果:

bob    5
jack   11
peter  32
john   463

当我使用“asort”时,索引丢失。如何在不丢失索引的情况下按数组值排序? (我需要根据它们的值进行有序索引)

(我需要仅使用 awk/gawk 获得此结果,而不是 shell 脚本、perl 等)

如果我的帖子不够清楚,这里是另一篇解释同一问题的帖子: http://www.experts-exchange.com/Programming/Languages /Scripting/Shell/Q_26626841.html

提前致谢

更新:

谢谢你们俩,但我需要按值排序,而不是索引(我想根据它们的值排序索引) 。

换句话说,我需要这个结果:

bob    5
jack   11
peter  32
john   463

不是:(

bob 5
jack 11
john 463
peter 32

我同意,我的例子很混乱,选择的值非常糟糕)

从 Catcall 的代码中,我编写了一个可以工作的快速实现,但它相当丑陋(我连接键和; 比较期间排序和拆分之前的值)。它看起来是这样的:

function qsort(A, left, right,   i, last) {
  if (left >= right)
    return
  swap(A, left, left+int((right-left+1)*rand()))
  last = left
  for (i = left+1; i <= right; i++)
    if (getPart(A[i], "value") < getPart(A[left], "value"))
      swap(A, ++last, i)
  swap(A, left, last)
  qsort(A, left, last-1)
  qsort(A, last+1, right)
}

function swap(A, i, j,   t) {
  t = A[i]; A[i] = A[j]; A[j] = t
}

function getPart(str, part) {
  if (part == "key")
    return substr(str, 1, index(str, "#")-1)
  if (part == "value")
    return substr(str, index(str, "#")+1, length(str))+0
  return
}

BEGIN {  }
      {  }
END {

  myArray["peter"] = 32
  myArray["bob"] = 5
  myArray["john"] = 463
  myArray["jack"] = 11

  for (key in myArray)
    sortvalues[j++] = key "#" myArray[key]

  qsort(sortvalues, 0, length(myArray));

  for (i = 1; i <= length(myArray); i++)
    print getPart(sortvalues[i], "key"), getPart(sortvalues[i], "value")
}

当然,如果你有更干净的东西,我很感兴趣......

感谢你的时间

Here's my array (gawk script) :

myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11

After sort, I need the following result :

bob    5
jack   11
peter  32
john   463

When i use "asort", indices are lost. How to sort by array value without losing indices ? (I need ordered indices based on their values)

(I need to obtain this result with awk/gawk only, not shell script, perl, etc)

If my post isn't clear enough, here is an other post explaining the same issue : http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_26626841.html )

Thanks in advance

Update :

Thanks to you both, but i need to sort by values, not indices (i want ordered indices according to their values).

In other terms, i need this result :

bob    5
jack   11
peter  32
john   463

not :

bob 5
jack 11
john 463
peter 32

(I agree, my example is confusing, the chosen values are pretty bad)

From the code of Catcall, I wrote a quick implementation that works, but it's rather ugly (I concatenate keys & values before sort and split during comparison). Here's what it looks like :

function qsort(A, left, right,   i, last) {
  if (left >= right)
    return
  swap(A, left, left+int((right-left+1)*rand()))
  last = left
  for (i = left+1; i <= right; i++)
    if (getPart(A[i], "value") < getPart(A[left], "value"))
      swap(A, ++last, i)
  swap(A, left, last)
  qsort(A, left, last-1)
  qsort(A, last+1, right)
}

function swap(A, i, j,   t) {
  t = A[i]; A[i] = A[j]; A[j] = t
}

function getPart(str, part) {
  if (part == "key")
    return substr(str, 1, index(str, "#")-1)
  if (part == "value")
    return substr(str, index(str, "#")+1, length(str))+0
  return
}

BEGIN {  }
      {  }
END {

  myArray["peter"] = 32
  myArray["bob"] = 5
  myArray["john"] = 463
  myArray["jack"] = 11

  for (key in myArray)
    sortvalues[j++] = key "#" myArray[key]

  qsort(sortvalues, 0, length(myArray));

  for (i = 1; i <= length(myArray); i++)
    print getPart(sortvalues[i], "key"), getPart(sortvalues[i], "value")
}

Of course I'm interested if you have something more clean...

Thanks for your time

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

青芜 2024-10-29 12:21:15

以下函数适用于 Gawk 3.1.7,并且不采用上述任何解决方法(无外部排序命令、无数组下标等)。它只是适用于关联数组的插入排序算法的基本实现。

您向其传递一个关联数组以对值进行排序,并传递一个空数组以填充相应的键。

myArray["peter"] = 32;
myArray["bob"] = 5;
myArray["john"] = 463;
myArray["jack"] = 11;

len = resort( myArray, result );

for( i = 1; i <= len; i++ ) {
    key = result[ i ];
    print i ": " key " = " myArray[ key ];
}

这是实现:

function resort( data, list,
        key, len, i, j, v )
{
        len = 0;
        for( key in data ) {
                list[ ++len ] = key;
        }

        # insertion sort algorithm adapted for
        # one-based associative arrays in gawk
        for( i = 2; i <= len; i++ )
        {
                v = list[ i ];
                for( j = i - 1; j >= 1; j-- )
                {
                        if( data[ list[ j ] ] <= data[ v ] )
                                break;
                        list[ j + 1 ] = list[ j ];
                }
                list[ j + 1 ] = v;
        }

        return len;
}

您可以通过简单地按降序迭代结果数组来“反向排序”。

The following function works in Gawk 3.1.7 and doesn't resort to any of the workarounds described above (no external sort command, no array subscripts, etc.) It's just a basic implementation of the insertion sort algorithm adapted for associative arrays.

You pass it an associative array to sort on the values and an empty array to populate with the corresponding keys.

myArray["peter"] = 32;
myArray["bob"] = 5;
myArray["john"] = 463;
myArray["jack"] = 11;

len = resort( myArray, result );

for( i = 1; i <= len; i++ ) {
    key = result[ i ];
    print i ": " key " = " myArray[ key ];
}

Here is the implementation:

function resort( data, list,
        key, len, i, j, v )
{
        len = 0;
        for( key in data ) {
                list[ ++len ] = key;
        }

        # insertion sort algorithm adapted for
        # one-based associative arrays in gawk
        for( i = 2; i <= len; i++ )
        {
                v = list[ i ];
                for( j = i - 1; j >= 1; j-- )
                {
                        if( data[ list[ j ] ] <= data[ v ] )
                                break;
                        list[ j + 1 ] = list[ j ];
                }
                list[ j + 1 ] = v;
        }

        return len;
}

You can "reverse sort" by simply iterating the resulting array in descending order.

波浪屿的海角声 2024-10-29 12:21:14

编辑:

按值排序

哦!要对进行排序,有点麻烦,但您可以使用原始数组的值和索引的串联作为新数组中的索引来创建一个临时数组。然后,您可以 asorti() 临时数组并将连接的值拆分回索引和值。如果您无法理解那些令人费解的描述,那么代码会更容易理解。它也很短。

# right justify the integers into space-padded strings and cat the index
# to create the new index
for (i in myArray) tmpidx[sprintf("%12s", myArray[i]),i] = i
num = asorti(tmpidx)
j = 0
for (i=1; i<=num; i++) {
    split(tmpidx[i], tmp, SUBSEP)
    indices[++j] = tmp[2]  # tmp[2] is the name
}
for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]

编辑2:

如果您有GAWK 4,您可以按值的顺序遍历数组,而无需执行显式排序:

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    PROCINFO["sorted_in"] = "@val_num_asc"

    for (i in myArray) {
        {print i, myArray[i]}}
    }

 }

有按索引或值、升序或降序等选项遍历的设置。您还可以指定自定义函数。

上一个答案:

按索引排序

如果您有 AWK,例如 gawk 3.1.2 或更高版本,它支持 asorti()

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    num = asorti(myArray, indices)
    for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]
}

如果您没有asorti()

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    for (i in myArray) indices[++j] = i
    num = asort(indices)
    for (i=1; i<=num; i++) print i, indices[i], myArray[indices[i]]
}

Edit:

Sort by values

Oh! To sort the values, it's a bit of a kludge, but you can create a temporary array using a concatenation of the values and the indices of the original array as indices in the new array. Then you can asorti() the temporary array and split the concatenated values back into indices and values. If you can't follow that convoluted description, the code is much easier to understand. It's also very short.

# right justify the integers into space-padded strings and cat the index
# to create the new index
for (i in myArray) tmpidx[sprintf("%12s", myArray[i]),i] = i
num = asorti(tmpidx)
j = 0
for (i=1; i<=num; i++) {
    split(tmpidx[i], tmp, SUBSEP)
    indices[++j] = tmp[2]  # tmp[2] is the name
}
for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]

Edit 2:

If you have GAWK 4, you can traverse the array by order of values without performing an explicit sort:

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    PROCINFO["sorted_in"] = "@val_num_asc"

    for (i in myArray) {
        {print i, myArray[i]}}
    }

 }

There are settings for traversing by index or value, ascending or descending and other options. You can also specify a custom function.

Previous answer:

Sort by indices

If you have an AWK, such as gawk 3.1.2 or greater, which supports asorti():

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    num = asorti(myArray, indices)
    for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]
}

If you don't have asorti():

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    for (i in myArray) indices[++j] = i
    num = asort(indices)
    for (i=1; i<=num; i++) print i, indices[i], myArray[indices[i]]
}
饭团 2024-10-29 12:21:14

使用带有管道的 Unix sort 命令,保持 Awk 代码简单并遵循 Unix 哲学
创建一个输入文件,其中的值以逗号分隔
彼得,32
杰克,11
约翰,463
bob,5

使用代码创建一个 sort.awk 文件

BEGIN { FS=","; }
{
    myArray[$1]=$2;
}
END {
    for (name in myArray)
        printf ("%s,%d\n", name, myArray[name]) | "sort -t, -k2 -n"
}

运行程序,应该给你输出
$ awk -f sort.awk 数据
鲍勃,5
杰克,11
彼得,32
约翰,463

Use the Unix sort command with the pipe, keeps Awk code simple and follow Unix philosophy
Create a input file with values seperated by comma
peter,32
jack,11
john,463
bob,5

Create a sort.awk file with the code

BEGIN { FS=","; }
{
    myArray[$1]=$2;
}
END {
    for (name in myArray)
        printf ("%s,%d\n", name, myArray[name]) | "sort -t, -k2 -n"
}

Run the program, should give you the output
$ awk -f sort.awk data
bob,5
jack,11
peter,32
john,463

杀手六號 2024-10-29 12:21:14
PROCINFO["sorted_in"] = "@val_num_desc";

在迭代数组之前,请使用上面的语句。但是,它可以在 awk 版本 4.0.1 中运行。它在 awk 版本 3.1.7 中不起作用。

我不确定它是在哪个中间版本中引入的。

PROCINFO["sorted_in"] = "@val_num_desc";

Before iterating an array, use the above statement. But, it works in awk version 4.0.1. It does not work in awk version 3.1.7.

I am not sure in which intermediate version, it got introduced.

雨巷深深 2024-10-29 12:21:14

简单的答案...

function sort_by_myArray(i1, v1, i2, v2) {
    return myArray[i2] < myArray[i1];
}

BEGIN {
    myArray["peter"] = 32;
    myArray["bob"] = 5;
    myArray["john"] = 463;
    myArray["jack"] = 11;
    len = length(myArray);

    asorti(myArray, k, "sort_by_myArray");

    # Print result.
    for(n = 1; n <= len; ++n) {
            print k[n], myArray[k[n]]
    }
}

And the simple answer...

function sort_by_myArray(i1, v1, i2, v2) {
    return myArray[i2] < myArray[i1];
}

BEGIN {
    myArray["peter"] = 32;
    myArray["bob"] = 5;
    myArray["john"] = 463;
    myArray["jack"] = 11;
    len = length(myArray);

    asorti(myArray, k, "sort_by_myArray");

    # Print result.
    for(n = 1; n <= len; ++n) {
            print k[n], myArray[k[n]]
    }
}
心如狂蝶 2024-10-29 12:21:14

使用分类:

#!/usr/bin/env -S gawk -f
{
    score[$1] = $0;
    array[sprintf("%3s",$2) $1] = $1;
}

END {
    asorti(array, b)
    for(i in b)
    {
        name = array[b[i]]
        print score[name]
    }
}

Use asorti:

#!/usr/bin/env -S gawk -f
{
    score[$1] = $0;
    array[sprintf("%3s",$2) $1] = $1;
}

END {
    asorti(array, b)
    for(i in b)
    {
        name = array[b[i]]
        print score[name]
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文