随机采样数组 - numpy.delete发行
我有2个数组,x_1g和x_2g。我想随机采样每个数组的10%,然后将其删除10%,然后将其插入另一个数组。这意味着我的最终和初始阵列应具有相同的形状,但是10%的数据是从另一个数组中随机采样的。我一直在使用下面的代码尝试此操作,但是我的数组的长度不断增加,这意味着我没有从每个数组中正确删除所采样的10%数据。
n = len(x_1g)
n2 = round(n/10)
ints1 = np.random.choice(n, n2)
x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)
x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)
我的阵列x_1g和x_2g具有形状(150298,10),
x_1g.shape
>> (1502983, 10)
x_1_replace.shape
>> (150298, 10)
因此,当我从原始阵列(x_1g)中删除10%数据(x_1_replace)时,我应该获得阵列形状:
1502983-150298 = 1352685
,但是当我检查我的形状时数组x_1我得到:
x_1.shape
>> (1359941, 10)
我不确定这里发生了什么,所以如果有人有任何建议,请让我知道!
I have 2 arrays, x_1g and x_2g. I want to randomly sample 10% of each array and remove that 10% and insert it into the other array. This means that my final and initial arrays should have the same shape, but 10% of the data is randomly sampled from the other array. I have been trying this with the code below but my arrays keep increasing in length, meaning I haven't properly deleted the sampled 10% data from each array.
n = len(x_1g)
n2 = round(n/10)
ints1 = np.random.choice(n, n2)
x_1_replace = x_1g[ints1,:]
x_1 = np.delete(x_1g, ints1, 0)
x_2_replace = x_2g[ints1,:]
x_2 = np.delete(x_2g, ints1, 0)
My arrays x_1g and x_2g have shapes (150298, 10)
x_1g.shape
>> (1502983, 10)
x_1_replace.shape
>> (150298, 10)
so when I remove the 10% data (x_1_replace) from my original array (x_1g) I should get the array shape:
1502983-150298 = 1352685
However when I check the shape of my array x_1 I get:
x_1.shape
>> (1359941, 10)
I'm not sure what is going on here so if anyone has any suggestions please let me know!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
发生的事情是,通过使用
ints1 = np.random.choice(n,n2)
来生成索引,您选择了n2倍0和n-1之间的n2倍。您无法保证将生成N2 不同的数字。您很可能会产生一定数量的重复项。而且,如果将相同的索引位置传递给np.delete
,它将仅删除一次。您可以通过在ints1
中阅读唯一值的数量来检查此信息:您会发现它与N2匹配(在您的示例中,您将获得
(143042,)
)。可能有多种方法可以确保您获得N2不同的索引,这是一个例子:
现在您可以检查:
What happens, is that by using
ints1 = np.random.choice(n, n2)
to generate your indices, you are choosing n2 times a number between 0 and n-1. You have no guarantee that you will generate n2 different numbers. You are most likely generating a certain number of duplicates. And if you pass several times the same index position tonp.delete
it will be deleted just once. You can check this by reading the number of unique values inints1
:You'll see it is not matching n2 (in your example, you'll get
(143042,)
).There's probably more than one way to ensure that you'll get n2 different indices, here is one example:
Now you can check: