在集合上使用聚合函数实现类似 SQL 的 group-by 的算法?

发布于 2024-10-06 21:22:53 字数 700 浏览 1 评论 0原文

假设您有一个像这样的数组:

[
  {'id' : 1, 'closed' : 1 },
  {'id' : 2, 'closed' : 1 },
  {'id' : 5, 'closed' : 1 },
  {'id' : 7, 'closed' : 0 },
  {'id' : 8, 'closed' : 0 },
  {'id' : 9, 'closed' : 1 }
]

我想总结这个数据集(不使用 SQL!),并获取由以下定义的每个组的 minmax id行'close'的变体。产生如下输出:

[
  {'id__min' : 1, 'id__max' : 5, 'closed' : 1},
  {'id__min' : 7, 'id__max' : 8, 'closed' : 0},
  {'id__min' : 9, 'id__max' : 9, 'closed' : 1}
]

这只是我想做的事情的一个示例。我想实现类似于 python 的 itertools.groupby 提供的东西,但更全面一些。 (想定义我自己的聚合函数)。

我正在寻找指针、伪代码,甚至任何 PHP、Python 或 Javascript 代码(如果可能的话)。

谢谢!

Let's say you have an array like this:

[
  {'id' : 1, 'closed' : 1 },
  {'id' : 2, 'closed' : 1 },
  {'id' : 5, 'closed' : 1 },
  {'id' : 7, 'closed' : 0 },
  {'id' : 8, 'closed' : 0 },
  {'id' : 9, 'closed' : 1 }
]

I'd like to summarize this dataset (not using SQL!), and grabbing the min and max id for each group defined by the variation of the row 'closed'. Resulting in output like this:

[
  {'id__min' : 1, 'id__max' : 5, 'closed' : 1},
  {'id__min' : 7, 'id__max' : 8, 'closed' : 0},
  {'id__min' : 9, 'id__max' : 9, 'closed' : 1}
]

This is just an example of what I'd like to do. I want to implement something that is similar to what python's itertools.groupby provides, but being a little more comprehensive. (Would like to define my own aggregation functions).

I am looking for pointers, pseudocode and even any of PHP, Python or Javascript code if possible.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

我做我的改变 2024-10-13 21:22:53

itertools.groupby() 的 key 参数 允许您传递自己的聚合函数。

The key argument to itertools.groupby() allows you to pass your own aggregation function.

淡看悲欢离合 2024-10-13 21:22:53

Ruby 代码:

def summarise array_of_hashes
    #first sort the list by id
    arr = array_of_hashes.sort {|a, b| a['id'] <=> b['id'] }
    #create a hash with id_min and id_max set to the id of the first
    #array element and closed to the closed of the first array element
    hash = {}
    hash['id_min'] = hash['id_max'] = arr[0]['id']
    hash['closed'] = arr[0]['closed']
    #prepare an output array
    output = []
    #iterate over the array elements
    arr.each do |el|
        if el['closed'] == hash['closed']
            #update id_max while the id value is the same
            hash['id_max'] = el['id']
        else #once it is different
            output.push hash #add the hash to the output array
            hash = {} #create a new hash in place of the old one
            #and initiate its keys to the appropriate values
            hash['id_min'] = hash['id_max'] = el['id']
            hash['closed'] = el['closed']
        end
    end
    output.push hash #make sure the final hash is added to the output array
    #return the output array
    output
end

通用版本:

def summarise data, condition, group_func
    #store the first hash in a variable to compare t
    pivot = data[0]
    to_group = []
    output = []
    #iterate through array
    data.each do |datum|
        #if the comparison of this datum to the pivot datum fits the condition
        if condition.call(pivot, datum)
            #add this datum to the to_group list
            to_group.push datum
        else #once the condition no longer matches
            #apply the aggregating function to the list to group and add it to the output array
            output.push group_func.call(to_group)
            #reset the to_group list and add this element to it
            to_group = [datum]
            #set the pivot to this element
            pivot = datum
        end
    end
    #make sure the final list to group are grouped and added to the output list
    output.push group_func.call(to_group)
    #return the output list
    output
end

以下代码将适用于您的示例:

my_condition = lambda do |a, b|
    b['closed'] == a['closed']
end

my_group_func = lambda do |to_group|
    {
        'id_min' => to_group[0]['id'],
        'id_max' => to_group[to_group.length-1]['id'],
        'closed' => to_group[0]['closed']
    }
end

summarise(my_array.sort {|a, b| a['id'] <=> b['id']}, my_condition, my_group_func)

通用算法适用于任何允许将函数作为参数传递给其他函数的语言。如果使用正确的条件和聚合函数,它还可以处理任何数据类型的变量数组。

Ruby code:

def summarise array_of_hashes
    #first sort the list by id
    arr = array_of_hashes.sort {|a, b| a['id'] <=> b['id'] }
    #create a hash with id_min and id_max set to the id of the first
    #array element and closed to the closed of the first array element
    hash = {}
    hash['id_min'] = hash['id_max'] = arr[0]['id']
    hash['closed'] = arr[0]['closed']
    #prepare an output array
    output = []
    #iterate over the array elements
    arr.each do |el|
        if el['closed'] == hash['closed']
            #update id_max while the id value is the same
            hash['id_max'] = el['id']
        else #once it is different
            output.push hash #add the hash to the output array
            hash = {} #create a new hash in place of the old one
            #and initiate its keys to the appropriate values
            hash['id_min'] = hash['id_max'] = el['id']
            hash['closed'] = el['closed']
        end
    end
    output.push hash #make sure the final hash is added to the output array
    #return the output array
    output
end

The generalised version:

def summarise data, condition, group_func
    #store the first hash in a variable to compare t
    pivot = data[0]
    to_group = []
    output = []
    #iterate through array
    data.each do |datum|
        #if the comparison of this datum to the pivot datum fits the condition
        if condition.call(pivot, datum)
            #add this datum to the to_group list
            to_group.push datum
        else #once the condition no longer matches
            #apply the aggregating function to the list to group and add it to the output array
            output.push group_func.call(to_group)
            #reset the to_group list and add this element to it
            to_group = [datum]
            #set the pivot to this element
            pivot = datum
        end
    end
    #make sure the final list to group are grouped and added to the output list
    output.push group_func.call(to_group)
    #return the output list
    output
end

The following code will then work for your example:

my_condition = lambda do |a, b|
    b['closed'] == a['closed']
end

my_group_func = lambda do |to_group|
    {
        'id_min' => to_group[0]['id'],
        'id_max' => to_group[to_group.length-1]['id'],
        'closed' => to_group[0]['closed']
    }
end

summarise(my_array.sort {|a, b| a['id'] <=> b['id']}, my_condition, my_group_func)

The generalised algorithm will work in any language that allows passing functions as arguments to other functions. It will also work with an array of variables of any data type if the correct condition and aggregating functions are used.

一个人练习一个人 2024-10-13 21:22:53

Ruby 代码的 PHP 版本,具有更通用的命名和 id 顺序处理:

$input = array(
    array('id' => 3, 'closed' => 1),
    array('id' => 2, 'closed' => 1),
    array('id' => 5, 'closed' => 1),
    array('id' => 7, 'closed' => 0),
    array('id' => 8, 'closed' => 0),
    array('id' => 9, 'closed' => 1)
);

$output = min_max_group($input, 'id', 'closed');
echo '<pre>'; print_r($output); echo '</pre>';

function min_max_group($array, $name, $group_by)
{
    $output = array();

    $tmp[$name.'__max'] = $tmp[$name.'__min'] =  $array[0][$name];
    $tmp[$group_by] = $array[0][$group_by];

    foreach($array as $value)
    {
        if($value[$group_by] == $tmp[$group_by])
        {
            if($value[$name] < $tmp[$name.'__min']) { $tmp[$name.'__min'] = $value[$name]; }
            if($value[$name] > $tmp[$name.'__max']) { $tmp[$name.'__max'] = $value[$name]; }
        }
        else
        {
            $output[] = $tmp;

            $tmp[$name.'__max'] = $tmp[$name.'__min'] = $value[$name];
            $tmp[$group_by] = $value[$group_by];

            if($value[$name] < $tmp[$name.'__min']) { $tmp[$name.'__min'] = $value[$name]; }
            if($value[$name] > $tmp[$name.'__max']) { $tmp[$name.'__max'] = $value[$name]; }
        }
    }

    $output[] = $tmp;

    return $output;
}

A PHP version of the Ruby code with slightly more generic naming and id order handling:

$input = array(
    array('id' => 3, 'closed' => 1),
    array('id' => 2, 'closed' => 1),
    array('id' => 5, 'closed' => 1),
    array('id' => 7, 'closed' => 0),
    array('id' => 8, 'closed' => 0),
    array('id' => 9, 'closed' => 1)
);

$output = min_max_group($input, 'id', 'closed');
echo '<pre>'; print_r($output); echo '</pre>';

function min_max_group($array, $name, $group_by)
{
    $output = array();

    $tmp[$name.'__max'] = $tmp[$name.'__min'] =  $array[0][$name];
    $tmp[$group_by] = $array[0][$group_by];

    foreach($array as $value)
    {
        if($value[$group_by] == $tmp[$group_by])
        {
            if($value[$name] < $tmp[$name.'__min']) { $tmp[$name.'__min'] = $value[$name]; }
            if($value[$name] > $tmp[$name.'__max']) { $tmp[$name.'__max'] = $value[$name]; }
        }
        else
        {
            $output[] = $tmp;

            $tmp[$name.'__max'] = $tmp[$name.'__min'] = $value[$name];
            $tmp[$group_by] = $value[$group_by];

            if($value[$name] < $tmp[$name.'__min']) { $tmp[$name.'__min'] = $value[$name]; }
            if($value[$name] > $tmp[$name.'__max']) { $tmp[$name.'__max'] = $value[$name]; }
        }
    }

    $output[] = $tmp;

    return $output;
}
停顿的约定 2024-10-13 21:22:53

也许我误解了这个问题,但这不就是一个标准map/reduce 问题?

Maybe I'm misunderstanding the problem, but isn't this just a standard map/reduce problem?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文