随机采样数组 - numpy.delete发行

发布于 2025-02-07 07:39:53 字数 728 浏览 3 评论 0原文

我有2个数组,x_1g和x_2g。我想随机采样每个数组的10%,然后将其删除10%,然后将其插入另一个数组。这意味着我的最终和初始阵列应具有相同的形状,但是10%的数据是从另一个数组中随机采样的。我一直在使用下面的代码尝试此操作,但是我的数组的长度不断增加,这意味着我没有从每个数组中正确删除所采样的10%数据。

n = len(x_1g)
n2 = round(n/10)

ints1 = np.random.choice(n, n2)

x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)

x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)

我的阵列x_1g和x_2g具有形状(150298,10),

x_1g.shape
>> (1502983, 10)

x_1_replace.shape 
>> (150298, 10)

因此,当我从原始阵列(x_1g)中删除10%数据(x_1_replace)时,我应该获得阵列形状:

1502983-150298 = 1352685

,但是当我检查我的形状时数组x_1我得到:

x_1.shape
>> (1359941, 10)

我不确定这里发生了什么,所以如果有人有任何建议,请让我知道!

I have 2 arrays, x_1g and x_2g. I want to randomly sample 10% of each array and remove that 10% and insert it into the other array. This means that my final and initial arrays should have the same shape, but 10% of the data is randomly sampled from the other array. I have been trying this with the code below but my arrays keep increasing in length, meaning I haven't properly deleted the sampled 10% data from each array.

n = len(x_1g)
n2 = round(n/10)

ints1 = np.random.choice(n, n2)

x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)

x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)

My arrays x_1g and x_2g have shapes (150298, 10)

x_1g.shape
>> (1502983, 10)

x_1_replace.shape 
>> (150298, 10)

so when I remove the 10% data (x_1_replace) from my original array (x_1g) I should get the array shape:

1502983-150298 = 1352685

However when I check the shape of my array x_1 I get:

x_1.shape
>> (1359941, 10)

I'm not sure what is going on here so if anyone has any suggestions please let me know!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小镇女孩 2025-02-14 07:39:53

发生的事情是,通过使用ints1 = np.random.choice(n,n2)来生成索引,您选择了n2倍0和n-1之间的n2倍。您无法保证将生成N2 不同的数字。您很可能会产生一定数量的重复项。而且,如果将相同的索引位置传递给np.delete,它将仅删除一次。您可以通过在ints1中阅读唯一值的数量来检查此信息:

np.unique(ints1).shape

您会发现它与N2匹配(在您的示例中,您将获得(143042,) )。

可能有多种方法可以确保您获得N2不同的索引,这是一个例子:

n = len(x_1g)
n2 = round(n/10)

ints1 = np.arange(n)  # generating an array [0 ... n-1]
np.random.shuffle(ints1)  # shuffle it
ints1 = ints1[:n2]  # take the first n2 values

x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)

x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)

现在您可以检查:

x_1.shape
# (1352685, 10)

What happens, is that by using ints1 = np.random.choice(n, n2) to generate your indices, you are choosing n2 times a number between 0 and n-1. You have no guarantee that you will generate n2 different numbers. You are most likely generating a certain number of duplicates. And if you pass several times the same index position to np.delete it will be deleted just once. You can check this by reading the number of unique values in ints1:

np.unique(ints1).shape

You'll see it is not matching n2 (in your example, you'll get (143042,)).

There's probably more than one way to ensure that you'll get n2 different indices, here is one example:

n = len(x_1g)
n2 = round(n/10)

ints1 = np.arange(n)  # generating an array [0 ... n-1]
np.random.shuffle(ints1)  # shuffle it
ints1 = ints1[:n2]  # take the first n2 values

x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)

x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)

Now you can check:

x_1.shape
# (1352685, 10)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文