通过平均调整大小或重新组合 numpy 二维数组

发布于 2024-12-15 02:02:17 字数 730 浏览 7 评论 0原文

我正在尝试在 python 中重新实现 IDL 函数：

http://star.pst。 qub.ac.uk/idl/REBIN.html

通过平均将二维数组的大小缩小一个整数因子。

例如：

>>> a=np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

我想通过取相关样本的平均值将其大小调整为（2,3），预期输出为：

>>> b = rebin(a, (2, 3))
>>> b
array([[  3.5,   5.5,  7.5],
       [ 15.5, 17.5,  19.5]])

即 b[0,0] = np.mean(a[:2, :2])、b[0,1] = np.mean(a[:2,2:4]) 等等。

我相信我应该重塑为 4 维数组，然后在正确的切片上取平均值，但无法找出算法。你有什么提示吗？

原文

I am trying to reimplement in python an IDL function:

http://star.pst.qub.ac.uk/idl/REBIN.html

which downsizes by an integer factor a 2d array by averaging.

For example:

>>> a=np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

I would like to resize it to (2,3) by taking the mean of the relevant samples, the expected output would be:

>>> b = rebin(a, (2, 3))
>>> b
array([[  3.5,   5.5,  7.5],
       [ 15.5, 17.5,  19.5]])

i.e. b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4]) and so on.

I believe I should reshape to a 4 dimensional array and then take the mean on the correct slice, but could not figure out the algorithm. Would you have any hint?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

落叶缤纷 2024-12-22 02:02:17

以下是基于您链接的答案的示例（为了清楚起见）：

>>> import numpy as np
>>> a = np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1)
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

作为一个函数：

def rebin(a, shape):
    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1]
    return a.reshape(sh).mean(-1).mean(1)

Here's an example based on the answer you've linked (for clarity):

>>> import numpy as np
>>> a = np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1)
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

As a function:

def rebin(a, shape):
    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1]
    return a.reshape(sh).mean(-1).mean(1)

回复收藏 0 原文

心舞飞扬 2024-12-22 02:02:17

JF Sebastian 对于 2D 分箱有一个很好的答案。这是他的“rebin”函数的一个版本，适用于 N 维：

def bin_ndarray(ndarray, new_shape, operation='sum'):
    """
    Bins an ndarray in all axes based on the target shape, by summing or
        averaging.

    Number of output dimensions must match number of input dimensions and 
        new axes must divide old ones.

    Example
    -------
    >>> m = np.arange(0,100,1).reshape((10,10))
    >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum')
    >>> print(n)

    [[ 22  30  38  46  54]
     [102 110 118 126 134]
     [182 190 198 206 214]
     [262 270 278 286 294]
     [342 350 358 366 374]]

    """
    operation = operation.lower()
    if not operation in ['sum', 'mean']:
        raise ValueError("Operation not supported.")
    if ndarray.ndim != len(new_shape):
        raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape,
                                                           new_shape))
    compression_pairs = [(d, c//d) for d,c in zip(new_shape,
                                                  ndarray.shape)]
    flattened = [l for p in compression_pairs for l in p]
    ndarray = ndarray.reshape(flattened)
    for i in range(len(new_shape)):
        op = getattr(ndarray, operation)
        ndarray = op(-1*(i+1))
    return ndarray

J.F. Sebastian has a great answer for 2D binning. Here is a version of his "rebin" function that works for N dimensions:

def bin_ndarray(ndarray, new_shape, operation='sum'):
    """
    Bins an ndarray in all axes based on the target shape, by summing or
        averaging.

    Number of output dimensions must match number of input dimensions and 
        new axes must divide old ones.

    Example
    -------
    >>> m = np.arange(0,100,1).reshape((10,10))
    >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum')
    >>> print(n)

    [[ 22  30  38  46  54]
     [102 110 118 126 134]
     [182 190 198 206 214]
     [262 270 278 286 294]
     [342 350 358 366 374]]

    """
    operation = operation.lower()
    if not operation in ['sum', 'mean']:
        raise ValueError("Operation not supported.")
    if ndarray.ndim != len(new_shape):
        raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape,
                                                           new_shape))
    compression_pairs = [(d, c//d) for d,c in zip(new_shape,
                                                  ndarray.shape)]
    flattened = [l for p in compression_pairs for l in p]
    ndarray = ndarray.reshape(flattened)
    for i in range(len(new_shape)):
        op = getattr(ndarray, operation)
        ndarray = op(-1*(i+1))
    return ndarray

回复收藏 0 原文

执笏见 2024-12-22 02:02:17

这是一种使用矩阵乘法来完成您所要求的操作的方法，不需要新的数组维度来除旧的数组维度。

首先，我们生成一个行压缩器矩阵和一个列压缩器矩阵（我确信有一种更简洁的方法可以做到这一点，甚至可能单独使用 numpy 操作）：

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]:
        if round(next_bin_break - which_column, 10) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()

...所以，例如， get_row_compressor(5, 3) 为您提供：

[[ 0.6  0.4  0.   0.   0. ]
 [ 0.   0.2  0.6  0.2  0. ]
 [ 0.   0.   0.   0.4  0.6]]

并且 get_column_compressor(3, 2) 为您提供：

[[ 0.66666667  0.        ]
 [ 0.33333333  0.33333333]
 [ 0.          0.66666667]]

然后只需通过行压缩器进行预乘，通过列压缩器进行后乘即可获得压缩矩阵：

def compress_and_average(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \
           np.mat(array) * \
           np.mat(get_column_compressor(array.shape[1], new_shape[1]))

使用此技术，

compress_and_average(np.array([[50, 7, 2, 0, 1],
                               [0, 0, 2, 8, 4],
                               [4, 1, 1, 0, 0]]), (2, 3))

产量：

[[ 21.86666667   2.66666667   2.26666667]
 [  1.86666667   1.46666667   1.86666667]]

Here's a way of doing what you ask using matrix multiplication that doesn't require the new array dimensions to divide the old.

First we generate a row compressor matrix and a column compressor matrix (I'm sure there's a cleaner way of doing this, maybe even using numpy operations alone):

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]:
        if round(next_bin_break - which_column, 10) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()

... so, for instance, get_row_compressor(5, 3) gives you:

[[ 0.6  0.4  0.   0.   0. ]
 [ 0.   0.2  0.6  0.2  0. ]
 [ 0.   0.   0.   0.4  0.6]]

and get_column_compressor(3, 2) gives you:

[[ 0.66666667  0.        ]
 [ 0.33333333  0.33333333]
 [ 0.          0.66666667]]

Then simply premultiply by the row compressor and postmultiply by the column compressor to get the compressed matrix:

def compress_and_average(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \
           np.mat(array) * \
           np.mat(get_column_compressor(array.shape[1], new_shape[1]))

Using this technique,

compress_and_average(np.array([[50, 7, 2, 0, 1],
                               [0, 0, 2, 8, 4],
                               [4, 1, 1, 0, 0]]), (2, 3))

yields:

[[ 21.86666667   2.66666667   2.26666667]
 [  1.86666667   1.46666667   1.86666667]]

回复收藏 0 原文

老旧海报 2024-12-22 02:02:17

我试图缩小栅格的比例 - 采用大约 6000 x 2000 大小的栅格，并将其转换为任意大小的较小栅格，该栅格在之前的 bin 大小中正确平均了值。我找到了使用 SciPy 的解决方案，但后来我无法将 SciPy 安装在我正在使用的共享托管服务上，所以我只是编写了这个函数。可能有更好的方法来做到这一点，不涉及循环遍历行和列，但这似乎确实有效。

这样做的好处是旧的行数和列数不必被新的行数和列数整除。

def resize_array(a, new_rows, new_cols): 
    '''
    This function takes an 2D numpy array a and produces a smaller array 
    of size new_rows, new_cols. new_rows and new_cols must be less than 
    or equal to the number of rows and columns in a.
    '''
    rows = len(a)
    cols = len(a[0])
    yscale = float(rows) / new_rows 
    xscale = float(cols) / new_cols

    # first average across the cols to shorten rows    
    new_a = np.zeros((rows, new_cols)) 
    for j in range(new_cols):
        # get the indices of the original array we are going to average across
        the_x_range = (j*xscale, (j+1)*xscale)
        firstx = int(the_x_range[0])
        lastx = int(the_x_range[1])
        # figure out the portion of the first and last index that overlap
        # with the new index, and thus the portion of those cells that 
        # we need to include in our average
        x0_scale = 1 - (the_x_range[0]-int(the_x_range[0]))
        xEnd_scale =  (the_x_range[1]-int(the_x_range[1]))
        # scale_line is a 1d array that corresponds to the portion of each old
        # index in the_x_range that should be included in the new average
        scale_line = np.ones((lastx-firstx+1))
        scale_line[0] = x0_scale
        scale_line[-1] = xEnd_scale
        # Make sure you don't screw up and include an index that is too large
        # for the array. This isn't great, as there could be some floating
        # point errors that mess up this comparison.
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lastx = lastx - 1
        # Now it's linear algebra time. Take the dot product of a slice of
        # the original array and the scale_line
        new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum()

    # Then average across the rows to shorten the cols. Same method as above.
    # It is probably possible to simplify this code, as this is more or less
    # the same procedure as the block of code above, but transposed.
    # Here I'm reusing the variable a. Sorry if that's confusing.
    a = np.zeros((new_rows, new_cols))
    for i in range(new_rows):
        the_y_range = (i*yscale, (i+1)*yscale)
        firsty = int(the_y_range[0])
        lasty = int(the_y_range[1])
        y0_scale = 1 - (the_y_range[0]-int(the_y_range[0]))
        yEnd_scale =  (the_y_range[1]-int(the_y_range[1]))
        scale_line = np.ones((lasty-firsty+1))
        scale_line[0] = y0_scale
        scale_line[-1] = yEnd_scale
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lasty = lasty - 1
        a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() 

    return a

I was trying to downscale a raster -- take a roughly 6000 by 2000 size raster and turn it into an arbitrarily sized smaller raster that averaged the values properly across the previous bins sizes. I found a solution using SciPy, but then I couldn't get SciPy to install on the shared hosting service I was using, so I just wrote this function instead. There is likely a better ways to do this that doesn't involve looping through the rows and columns, but this does seem to work.

The nice part about this is that the old number of rows and columns don't have to be divisible by the new number of rows and columns.

def resize_array(a, new_rows, new_cols): 
    '''
    This function takes an 2D numpy array a and produces a smaller array 
    of size new_rows, new_cols. new_rows and new_cols must be less than 
    or equal to the number of rows and columns in a.
    '''
    rows = len(a)
    cols = len(a[0])
    yscale = float(rows) / new_rows 
    xscale = float(cols) / new_cols

    # first average across the cols to shorten rows    
    new_a = np.zeros((rows, new_cols)) 
    for j in range(new_cols):
        # get the indices of the original array we are going to average across
        the_x_range = (j*xscale, (j+1)*xscale)
        firstx = int(the_x_range[0])
        lastx = int(the_x_range[1])
        # figure out the portion of the first and last index that overlap
        # with the new index, and thus the portion of those cells that 
        # we need to include in our average
        x0_scale = 1 - (the_x_range[0]-int(the_x_range[0]))
        xEnd_scale =  (the_x_range[1]-int(the_x_range[1]))
        # scale_line is a 1d array that corresponds to the portion of each old
        # index in the_x_range that should be included in the new average
        scale_line = np.ones((lastx-firstx+1))
        scale_line[0] = x0_scale
        scale_line[-1] = xEnd_scale
        # Make sure you don't screw up and include an index that is too large
        # for the array. This isn't great, as there could be some floating
        # point errors that mess up this comparison.
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lastx = lastx - 1
        # Now it's linear algebra time. Take the dot product of a slice of
        # the original array and the scale_line
        new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum()

    # Then average across the rows to shorten the cols. Same method as above.
    # It is probably possible to simplify this code, as this is more or less
    # the same procedure as the block of code above, but transposed.
    # Here I'm reusing the variable a. Sorry if that's confusing.
    a = np.zeros((new_rows, new_cols))
    for i in range(new_rows):
        the_y_range = (i*yscale, (i+1)*yscale)
        firsty = int(the_y_range[0])
        lasty = int(the_y_range[1])
        y0_scale = 1 - (the_y_range[0]-int(the_y_range[0]))
        yEnd_scale =  (the_y_range[1]-int(the_y_range[1]))
        scale_line = np.ones((lasty-firsty+1))
        scale_line[0] = y0_scale
        scale_line[-1] = yEnd_scale
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lasty = lasty - 1
        a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() 

    return a

回复收藏 0 原文

入怼 2024-12-22 02:02:17

我对 MarcTheSpark 的答案有疑问，除了某些特定的输出形状外，该答案效果很好。我必须在 get_row_compressor 的第一个条件中更改 round() 函数的值。

如果我的声誉足够的话，我只会发表评论。

我还添加了一个片段来重新整理一维数组。

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while (
        which_row < (dim_compressor.shape[0]) and which_column < (dim_compressor.shape[1])
    ):
        if round(next_bin_break - which_column, 1) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()


def rebin(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return (
        np.mat(get_row_compressor(array.shape[0], new_shape[0]))
        * np.mat(array)
        * np.mat(get_column_compressor(array.shape[1], new_shape[1]))
    )

def rebin_1d(array, new_len):
    array_t = array.reshape((1, len(array)))
    array_rebinned = rebin(array_t, (1, new_len))
    return np.squeeze(np.asarray(array_rebinned))

I had a problem with MarcTheSpark's answer, which worked great except for some specific output shapes. I had to change the value of the round() function in the first condition of get_row_compressor.

Would have commented only, if my reputation was enough.

I also add a snippet to rebin 1D arrays.

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while (
        which_row < (dim_compressor.shape[0]) and which_column < (dim_compressor.shape[1])
    ):
        if round(next_bin_break - which_column, 1) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()


def rebin(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return (
        np.mat(get_row_compressor(array.shape[0], new_shape[0]))
        * np.mat(array)
        * np.mat(get_column_compressor(array.shape[1], new_shape[1]))
    )

def rebin_1d(array, new_len):
    array_t = array.reshape((1, len(array)))
    array_rebinned = rebin(array_t, (1, new_len))
    return np.squeeze(np.asarray(array_rebinned))

回复收藏 0 原文

~没有更多了~