连接 Numpy 数组而不进行复制

发布于 2024-12-11 06:13:49 字数 537 浏览 0 评论 0原文

在 Numpy 中,我可以使用 np.append 或 np.concatenate 端到端连接两个数组:

>>> X = np.array([[1,2,3]])
>>> Y = np.array([[-1,-2,-3],[4,5,6]])
>>> Z = np.append(X, Y, axis=0)
>>> Z
array([[ 1,  2,  3],
       [-1, -2, -3],
       [ 4,  5,  6]])

但是这些会复制其输入数组:

>>> Z[0,:] = 0
>>> Z
array([[ 0,  0,  0],
       [-1, -2, -3],
       [ 4,  5,  6]])
>>> X
array([[1, 2, 3]])

有没有办法将两个数组连接到一个视图中,即不复制?这需要 np.ndarray 子类吗?

In Numpy, I can concatenate two arrays end-to-end with np.append or np.concatenate:

>>> X = np.array([[1,2,3]])
>>> Y = np.array([[-1,-2,-3],[4,5,6]])
>>> Z = np.append(X, Y, axis=0)
>>> Z
array([[ 1,  2,  3],
       [-1, -2, -3],
       [ 4,  5,  6]])

But these make copies of their input arrays:

>>> Z[0,:] = 0
>>> Z
array([[ 0,  0,  0],
       [-1, -2, -3],
       [ 4,  5,  6]])
>>> X
array([[1, 2, 3]])

Is there a way to concatenate two arrays into a view, i.e. without copying? Would that require an np.ndarray subclass?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

一抹淡然 2024-12-18 06:13:49

属于 Numpy 数组的内存必须是连续的。如果单独分配数组,它们会随机分散在内存中,并且无法将它们表示为视图 Numpy 数组。

如果您事先知道需要多少个数组,则可以从预先分配的一个大数组开始,并让每个小数组成为大数组的视图(例如通过切片获得)。

The memory belonging to a Numpy array must be contiguous. If you allocated the arrays separately, they are randomly scattered in memory, and there is no way to represent them as a view Numpy array.

If you know beforehand how many arrays you need, you can instead start with one big array that you allocate beforehand, and have each of the small arrays be a view to the big array (e.g. obtained by slicing).

逆流 2024-12-18 06:13:49

只需在用数据填充数组之前初始化数组即可。如果你愿意,你可以分配比需要更多的空间,并且由于 numpy 的工作方式,它不会占用更多的 RAM。

A = np.zeros(R,C)
A[row] = [data]

仅当数据放入数组后才会使用内存。在任何大小的数据集上,通过连接两个数组创建一个新数组永远不会完成,即数据集> 1GB左右。

Just initialize the array before you fill it with data. If you want you can allocate more space than needed and it will not take up more RAM because of the way numpy works.

A = np.zeros(R,C)
A[row] = [data]

The memory is used only once data is put into the array. Creating a new array from concatenating two will never finish on a dataset of any size, i.e. dataset > 1GB or so.

赤濁 2024-12-18 06:13:49

我遇到了同样的问题,最终把它颠倒过来,在正常连接(带副本)后,我重新分配原始数组以成为连接数组的视图:

import numpy as np

def concat_no_copy(arrays):
    """ Concats the arrays and returns the concatenated array 
    in addition to the original arrays as views of the concatenated one.

    Parameters:
    -----------
    arrays: list
        the list of arrays to concatenate
    """
    con = np.concatenate(arrays)

    viewarrays = []
    for i, arr in enumerate(arrays):
        arrnew = con[sum(len(a) for a in arrays[:i]):
                     sum(len(a) for a in arrays[:i + 1])]
        viewarrays.append(arrnew)
        assert all(arr == arrnew)

    # return the view arrays, replace the old ones with these
    return con, viewarrays

您可以按如下方式测试它:

def test_concat_no_copy():
    arr1 = np.array([0, 1, 2, 3, 4])
    arr2 = np.array([5, 6, 7, 8, 9])
    arr3 = np.array([10, 11, 12, 13, 14])

    arraylist = [arr1, arr2, arr3]

    con, newarraylist = concat_no_copy(arraylist)

    assert all(con == np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
                                11, 12, 13, 14]))

    for old, new in zip(arraylist, newarraylist):
        assert all(old == new)

I had the same problem and ended up doing it reversed, after concatenating normally (with copy) I reassigned the original arrays to become views on the concatenated one:

import numpy as np

def concat_no_copy(arrays):
    """ Concats the arrays and returns the concatenated array 
    in addition to the original arrays as views of the concatenated one.

    Parameters:
    -----------
    arrays: list
        the list of arrays to concatenate
    """
    con = np.concatenate(arrays)

    viewarrays = []
    for i, arr in enumerate(arrays):
        arrnew = con[sum(len(a) for a in arrays[:i]):
                     sum(len(a) for a in arrays[:i + 1])]
        viewarrays.append(arrnew)
        assert all(arr == arrnew)

    # return the view arrays, replace the old ones with these
    return con, viewarrays

You can test it as follows:

def test_concat_no_copy():
    arr1 = np.array([0, 1, 2, 3, 4])
    arr2 = np.array([5, 6, 7, 8, 9])
    arr3 = np.array([10, 11, 12, 13, 14])

    arraylist = [arr1, arr2, arr3]

    con, newarraylist = concat_no_copy(arraylist)

    assert all(con == np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
                                11, 12, 13, 14]))

    for old, new in zip(arraylist, newarraylist):
        assert all(old == new)
荒芜了季节 2024-12-18 06:13:49

一点也不优雅,但是您可以使用元组存储指向数组的指针来接近您想要的效果。现在我不知道如何在这种情况下使用它,但我以前做过类似的事情。

>>> X = np.array([[1,2,3]])
>>> Y = np.array([[-1,-2,-3],[4,5,6]])
>>> z = (X, Y)
>>> z[0][:] = 0
>>> z
(array([[0, 0, 0]]), array([[-1, -2, -3],
       [ 4,  5,  6]]))
>>> X
array([[0, 0, 0]])

Not really elegant at all but you can get close to what you want using a tuple to store pointers to the arrays. Now I have no idea how I would use it in the case but I have done things like this before.

>>> X = np.array([[1,2,3]])
>>> Y = np.array([[-1,-2,-3],[4,5,6]])
>>> z = (X, Y)
>>> z[0][:] = 0
>>> z
(array([[0, 0, 0]]), array([[-1, -2, -3],
       [ 4,  5,  6]]))
>>> X
array([[0, 0, 0]])
森林很绿却致人迷途 2024-12-18 06:13:49

答案基于我的其他答案 Reference to ndarray rows in ndarray< /a>

X = np.array([[1,2,3]])
Y = np.array([[-1,-2,-3],[4,5,6]])
Z = np.array([None, None, None])
Z[0] = X[0]
Z[1] = Y[0]
Z[2] = Y[1]

Z[0][0] = 5 # X would be changed as well

print(X)
Output: 
array([[5, 2, 3]])

# Let's make it a function!
def concat(X, Y, copy=True):
    """Return an array of references if copy=False""" 
    if copy is True:  # deep copy
        return np.append(X, Y, axis=0)
    len_x, len_y = len(X), len(Y)
    ret = np.array([None for _ in range(len_x + len_y)])
    for i in range(len_x):
        ret[i] = X[i]
    for j in range(len_y):
        ret[len_x + j] = Y[j] 
    return ret

The answer is based on my other answer in Reference to ndarray rows in ndarray

X = np.array([[1,2,3]])
Y = np.array([[-1,-2,-3],[4,5,6]])
Z = np.array([None, None, None])
Z[0] = X[0]
Z[1] = Y[0]
Z[2] = Y[1]

Z[0][0] = 5 # X would be changed as well

print(X)
Output: 
array([[5, 2, 3]])

# Let's make it a function!
def concat(X, Y, copy=True):
    """Return an array of references if copy=False""" 
    if copy is True:  # deep copy
        return np.append(X, Y, axis=0)
    len_x, len_y = len(X), len(Y)
    ret = np.array([None for _ in range(len_x + len_y)])
    for i in range(len_x):
        ret[i] = X[i]
    for j in range(len_y):
        ret[len_x + j] = Y[j] 
    return ret
孤千羽 2024-12-18 06:13:49

您可以创建一个数组的数组,例如:

>>> from numpy import *
>>> a = array([1.0, 2.0, 3.0])
>>> b = array([4.0, 5.0])
>>> c = array([a, b])
>>> c
array([[ 1.  2.  3.], [ 4.  5.]], dtype=object)
>>> a[0] = 100.0
>>> a
array([ 100.,    2.,    3.])
>>> c
array([[ 100.    2.    3.], [ 4.  5.]], dtype=object)
>>> c[0][1] = 200.0
>>> a
array([ 100.,  200.,    3.])
>>> c
array([[ 100.  200.    3.], [ 4.  5.]], dtype=object)
>>> c *= 1000
>>> c
array([[ 100000.  200000.    3000.], [ 4000.  5000.]], dtype=object)
>>> a
array([ 100.,  200.,    3.])
>>> # Oops! Copies were made...

问题是它在广播操作上创建副本(听起来像一个错误)。

You may create an array of arrays, like:

>>> from numpy import *
>>> a = array([1.0, 2.0, 3.0])
>>> b = array([4.0, 5.0])
>>> c = array([a, b])
>>> c
array([[ 1.  2.  3.], [ 4.  5.]], dtype=object)
>>> a[0] = 100.0
>>> a
array([ 100.,    2.,    3.])
>>> c
array([[ 100.    2.    3.], [ 4.  5.]], dtype=object)
>>> c[0][1] = 200.0
>>> a
array([ 100.,  200.,    3.])
>>> c
array([[ 100.  200.    3.], [ 4.  5.]], dtype=object)
>>> c *= 1000
>>> c
array([[ 100000.  200000.    3000.], [ 4000.  5000.]], dtype=object)
>>> a
array([ 100.,  200.,    3.])
>>> # Oops! Copies were made...

The problem is that it creates copies on broadcast operations (sounds like a bug).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文