当前位置：文江博客话题详情

Python arrays numpy sampling

我如何检查另一个较大的阵列中存在哪一行？

发布于 2025-01-28 18:47:31 字数 1169 浏览 2 评论 0原文

我如何检查另一个较大的阵列中存在哪一行？

给定以下设置：

final_batch = np.emtpy((batch_size,2))
batch_size = 4
a = np.array(range(10))
b = np.array(range(10,20))
edges = np.array([[0,11],[0,12],[1,11],[1,12],[0,17]])


c1 = np.random.choice(a,batch).reshape(-1,1)
c2 = np.random.choice(b,batch).reshape(-1,1)
samples = np.append(c1,c2,axis=1)

现在可以在样本和边缘中存在副词，我想继续制作np.random.choice，如果它们还不存在，则仅将它们添加到final_batch中。这样做的简单方法就是将它们1 x 1 x 1 x循环中，

while len(final_batch)<batch_size+1:
    c1 = np.random.choice(a,1).reshape(-1,1)
    c2 = np.random.choice(b,1).reshape(-1,1)
    if not np.isin(c1,c2).any():
        final_batch = np.append(final_batch,np.append(c1,c2,axis=1),axis=0)    

final_batch = final_batch[1:]

但所有a，b和edges都可以很大批处理大小将是10k，但是由于要立即采样许多元素的速度更快，因此我想看看是否有更快的方法。请

while len(final_batch)<batch_size+1:
     c1 = np.random.choice(a,batch).reshape(-1,1)
     c2 = np.random.choice(b,batch).reshape(-1,1)
     samples = np.append(c1,c2,axis=1)
     full_batch.append(samples NOT IN edges)

注意，C1和C2是互斥的，所以我觉得我应该能够以某种方式使用它。

How do I check which rows of one small array exists in another larger one?

Given the following setup:

final_batch = np.emtpy((batch_size,2))
batch_size = 4
a = np.array(range(10))
b = np.array(range(10,20))
edges = np.array([[0,11],[0,12],[1,11],[1,12],[0,17]])


c1 = np.random.choice(a,batch).reshape(-1,1)
c2 = np.random.choice(b,batch).reshape(-1,1)
samples = np.append(c1,c2,axis=1)

Now there can exist dubplicates in samples and edges, I want to keep making np.random.choice and only add them to final_batch IF they don't already exist in edges. The simple way to do this would be to just take them 1 by 1 in a loop

while len(final_batch)<batch_size+1:
    c1 = np.random.choice(a,1).reshape(-1,1)
    c2 = np.random.choice(b,1).reshape(-1,1)
    if not np.isin(c1,c2).any():
        final_batch = np.append(final_batch,np.append(c1,c2,axis=1),axis=0)    

final_batch = final_batch[1:]

But all of a,b and edges can be huge and batch size will be 10k, but as it's way faster to sample many elements at once I wanted to see if there is a faster way. Something like

while len(final_batch)<batch_size+1:
     c1 = np.random.choice(a,batch).reshape(-1,1)
     c2 = np.random.choice(b,batch).reshape(-1,1)
     samples = np.append(c1,c2,axis=1)
     full_batch.append(samples NOT IN edges)

Note that c1 and c2 are mutually exclusive, so I feel like I should be able to use this somehow.

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

眼波传意 2025-02-04 18:47:31

如果我理解您的问题，您正在寻找类似的

samples = np.empty((10, 2), dtype=int)
samples[:,0] = np.random.choice(a, 10)
samples[:,1] = np.random.choice(b, 10)
new_indices = (samples != edges[:,None]).any(axis=2).all(axis=0)
new_samples = samples[new_indices]

意思我生成10个新样本，那么我会看看它们是否匹配边缘。这并不是最佳的操作号码，因为即使我找到了匹配项，我仍会继续检查平等，但这是用numpy矢量化的，numpy通常比尽快停止更快。

If I understand your question, you are looking for something like

samples = np.empty((10, 2), dtype=int)
samples[:,0] = np.random.choice(a, 10)
samples[:,1] = np.random.choice(b, 10)
new_indices = (samples != edges[:,None]).any(axis=2).all(axis=0)
new_samples = samples[new_indices]

Meaning I generate 10 new samples, then I look whether they match edges. This is not optimal in operation number, as I continue checking for equality even after I found a match, but this is vectorized with numpy, which is usually faster than stopping as soon as you can.

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

27 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

夢野间

文章 0 评论 0

百度③文鱼

文章 0 评论 0

小草泠泠

文章 0 评论 0

zhuwenyan

文章 0 评论 0

weirdo

文章 0 评论 0

坚持沉默

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文