从 NumPy 数组中随机选择单元格 - 无需替换

发布于 2024-09-26 08:10:44 字数 254 浏览 8 评论 0原文

我正在 NumPy 中编写一些建模例程,需要从 NumPy 数组中随机选择单元格并对它们进行一些处理。必须选择所有单元格而不进行替换(例如,一旦选择了单元格,就不能再次选择它,但最后必须选择所有单元格)。

我正在从 IDL 过渡,在那里我可以找到一个很好的方法来做到这一点,但我认为 NumPy 也有一个很好的方法来做到这一点。你有什么建议?

更新:我应该声明我正在尝试在 2D 数组上执行此操作,因此会返回一组 2D 索引。

I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell has been selected it can't be selected again, but all cells must be selected by the end).

I'm transitioning from IDL where I can find a nice way to do this, but I assume that NumPy has a nice way to do this too. What would you suggest?

Update: I should have stated that I'm trying to do this on 2D arrays, and therefore get a set of 2D indices back.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

本王不退位尔等都是臣 2024-10-03 08:10:44

如果您仍然需要原始数组,那么使用 numpy.random.shuffle 或 numpy.random.permutation 怎么样?

如果您需要就地更改数组,可以创建一个索引数组,如下所示:

your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)

print your_array[index_array[:10]]

How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?

If you need to change the array in-place than you can create an index array like this:

your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)

print your_array[index_array[:10]]
可可 2024-10-03 08:10:44

所有这些答案对我来说似乎有点令人费解。

我假设您有一个多维数组,您想从中生成详尽的索引列表。您希望对这些索引进行打乱,以便您可以按随机顺序访问每个数组元素。

以下代码将以简单直接的方式执行此操作:

#!/usr/bin/python
import numpy as np

#Define a two-dimensional array
#Use any number of dimensions, and dimensions of any size
d=numpy.zeros(30).reshape((5,6))

#Get a list of indices for an array of this shape
indices=list(np.ndindex(d.shape))

#Shuffle the indices in-place
np.random.shuffle(indices)

#Access array elements using the indices to do cool stuff
for i in indices:
  d[i]=5

print d

打印 d 验证所有元素均已被访问。

请注意,数组可以具有任意数量的维度,并且维度可以是任意大小。

这种方法的唯一缺点是,如果d很大,那么索引可能会变得相当大。因此,如果有一个生成器就好了。可悲的是,我无法立即想到如何构建一个打乱的迭代器。

All of these answers seemed a little convoluted to me.

I'm assuming that you have a multi-dimensional array from which you want to generate an exhaustive list of indices. You'd like these indices shuffled so you can then access each of the array elements in a randomly order.

The following code will do this in a simple and straight-forward manner:

#!/usr/bin/python
import numpy as np

#Define a two-dimensional array
#Use any number of dimensions, and dimensions of any size
d=numpy.zeros(30).reshape((5,6))

#Get a list of indices for an array of this shape
indices=list(np.ndindex(d.shape))

#Shuffle the indices in-place
np.random.shuffle(indices)

#Access array elements using the indices to do cool stuff
for i in indices:
  d[i]=5

print d

Printing d verified that all elements have been accessed.

Note that the array can have any number of dimensions and that the dimensions can be of any size.

The only downside to this approach is that if d is large, then indices may become pretty sizable. Therefore, it would be nice to have a generator. Sadly, I can't think of how to build a shuffled iterator off-handedly.

无言温柔 2024-10-03 08:10:44

扩展 @WoLpH 的好答案

对于 2D 数组,我认为这取决于您想要或需要了解的内容指数。

您可以执行以下操作:

data = np.arange(25).reshape((5,5))

x, y  = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)

OR

data = np.arange(25).reshape((5,5))

grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)

然后您可以使用列表idx 根据需要迭代随机排序的二维数组索引,并获取该索引处的值data 保持不变。

注意:您也可以通过 itertools.product 生成随机排序的索引,以防您更熟悉这套工具。

Extending the nice answer from @WoLpH

For a 2D array I think it will depend on what you want or need to know about the indices.

You could do something like this:

data = np.arange(25).reshape((5,5))

x, y  = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)

OR

data = np.arange(25).reshape((5,5))

grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)

You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.

Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.

北笙凉宸 2024-10-03 08:10:44

使用 random.sample 生成 0 .. A.size 且不重复的整数,
然后将它们拆分为索引对:

import random
import numpy as np

def randint2_nodup( nsample, A ):
    """ uniform int pairs, no dups:
        r = randint2_nodup( nsample, A )
        A[r]
        for jk in zip(*r):
            ... A[jk]
    """
    assert A.ndim == 2
    sample = np.array( random.sample( xrange( A.size ), nsample ))  # nodup ints
    return sample // A.shape[1], sample % A.shape[1]  # pairs


if __name__ == "__main__":
    import sys

    nsample = 8
    ncol = 5
    exec "\n".join( sys.argv[1:] )  # run this.py N= ...
    A = np.arange( 0, 2*ncol ).reshape((2,ncol))

    r = randint2_nodup( nsample, A )
    print "r:", r
    print "A[r]:", A[r]
    for jk in zip(*r):
        print jk, A[jk]

Use random.sample to generates ints in 0 .. A.size with no duplicates,
then split them to index pairs:

import random
import numpy as np

def randint2_nodup( nsample, A ):
    """ uniform int pairs, no dups:
        r = randint2_nodup( nsample, A )
        A[r]
        for jk in zip(*r):
            ... A[jk]
    """
    assert A.ndim == 2
    sample = np.array( random.sample( xrange( A.size ), nsample ))  # nodup ints
    return sample // A.shape[1], sample % A.shape[1]  # pairs


if __name__ == "__main__":
    import sys

    nsample = 8
    ncol = 5
    exec "\n".join( sys.argv[1:] )  # run this.py N= ...
    A = np.arange( 0, 2*ncol ).reshape((2,ncol))

    r = randint2_nodup( nsample, A )
    print "r:", r
    print "A[r]:", A[r]
    for jk in zip(*r):
        print jk, A[jk]
街道布景 2024-10-03 08:10:44

假设您有一个大小为 8x3 的数据点数组,

data = np.arange(50,74).reshape(8,-1)

如果您确实想将所有索引采样为 2d 对,我能想到的最紧凑的方法是:

#generate a permutation of data's size, coerced to data's shape
idxs = divmod(np.random.permutation(data.size),data.shape[1])

#iterate over it
for x,y in zip(*idxs): 
    #do something to data[x,y] here
    pass

一般来说,Moe,人们通常不需要将二维数组作为二维数组来访问,只是为了对它们进行洗牌,在这种情况下,可以更加紧凑。只需对数组进行一维视图即可,这样就可以避免一些索引问题。

flat_data = data.ravel()
flat_idxs = np.random.permutation(flat_data.size)
for i in flat_idxs:
    #do something to flat_data[i] here
    pass

这仍然会按照您的意愿排列二维“原始”数组。要看到这一点,请尝试:

 flat_data[12] = 1000000
 print data[4,0]
 #returns 1000000

Let's say you have an array of data points of size 8x3

data = np.arange(50,74).reshape(8,-1)

If you truly want to sample, as you say, all the indices as 2d pairs, the most compact way to do this that i can think of, is:

#generate a permutation of data's size, coerced to data's shape
idxs = divmod(np.random.permutation(data.size),data.shape[1])

#iterate over it
for x,y in zip(*idxs): 
    #do something to data[x,y] here
    pass

Moe generally, though, one often does not need to access 2d arrays as 2d array simply to shuffle 'em, in which case one can be yet more compact. just make a 1d view onto the array and save yourself some index-wrangling.

flat_data = data.ravel()
flat_idxs = np.random.permutation(flat_data.size)
for i in flat_idxs:
    #do something to flat_data[i] here
    pass

This will still permute the 2d "original" array as you'd like. To see this, try:

 flat_data[12] = 1000000
 print data[4,0]
 #returns 1000000
人疚 2024-10-03 08:10:44

使用 numpy 1.7 或更高版本的人还可以使用内置函数 numpy.random.choice

people using numpy version 1.7 or later there can also use the builtin function numpy.random.choice

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文