我如何检查另一个较大的阵列中存在哪一行?

发布于 2025-01-28 18:47:31 字数 1169 浏览 2 评论 0原文

我如何检查另一个较大的阵列中存在哪一行?

给定以下设置:

final_batch = np.emtpy((batch_size,2))
batch_size = 4
a = np.array(range(10))
b = np.array(range(10,20))
edges = np.array([[0,11],[0,12],[1,11],[1,12],[0,17]])


c1 = np.random.choice(a,batch).reshape(-1,1)
c2 = np.random.choice(b,batch).reshape(-1,1)
samples = np.append(c1,c2,axis=1)

现在可以在样本和边缘中存在副词,我想继续制作np.random.choice,如果它们还不存在,则仅将它们添加到final_batch中。这样做的简单方法就是将它们1 x 1 x 1 x循环中,

while len(final_batch)<batch_size+1:
    c1 = np.random.choice(a,1).reshape(-1,1)
    c2 = np.random.choice(b,1).reshape(-1,1)
    if not np.isin(c1,c2).any():
        final_batch = np.append(final_batch,np.append(c1,c2,axis=1),axis=0)    

final_batch = final_batch[1:]

但所有abedges都可以很大批处理大小将是10k,但是由于要立即采样许多元素的速度更快,因此我想看看是否有更快的方法。请

while len(final_batch)<batch_size+1:
     c1 = np.random.choice(a,batch).reshape(-1,1)
     c2 = np.random.choice(b,batch).reshape(-1,1)
     samples = np.append(c1,c2,axis=1)
     full_batch.append(samples NOT IN edges)
     

注意,C1和C2是互斥的,所以我觉得我应该能够以某种方式使用它。

How do I check which rows of one small array exists in another larger one?

Given the following setup:

final_batch = np.emtpy((batch_size,2))
batch_size = 4
a = np.array(range(10))
b = np.array(range(10,20))
edges = np.array([[0,11],[0,12],[1,11],[1,12],[0,17]])


c1 = np.random.choice(a,batch).reshape(-1,1)
c2 = np.random.choice(b,batch).reshape(-1,1)
samples = np.append(c1,c2,axis=1)

Now there can exist dubplicates in samples and edges, I want to keep making np.random.choice and only add them to final_batch IF they don't already exist in edges. The simple way to do this would be to just take them 1 by 1 in a loop

while len(final_batch)<batch_size+1:
    c1 = np.random.choice(a,1).reshape(-1,1)
    c2 = np.random.choice(b,1).reshape(-1,1)
    if not np.isin(c1,c2).any():
        final_batch = np.append(final_batch,np.append(c1,c2,axis=1),axis=0)    

final_batch = final_batch[1:]

But all of a,b and edges can be huge and batch size will be 10k, but as it's way faster to sample many elements at once I wanted to see if there is a faster way. Something like

while len(final_batch)<batch_size+1:
     c1 = np.random.choice(a,batch).reshape(-1,1)
     c2 = np.random.choice(b,batch).reshape(-1,1)
     samples = np.append(c1,c2,axis=1)
     full_batch.append(samples NOT IN edges)
     

Note that c1 and c2 are mutually exclusive, so I feel like I should be able to use this somehow.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

眼波传意 2025-02-04 18:47:31

如果我理解您的问题,您正在寻找类似的

samples = np.empty((10, 2), dtype=int)
samples[:,0] = np.random.choice(a, 10)
samples[:,1] = np.random.choice(b, 10)
new_indices = (samples != edges[:,None]).any(axis=2).all(axis=0)
new_samples = samples[new_indices]

意思我生成10个新样本,那么我会看看它们是否匹配边缘。这并不是最佳的操作号码,因为即使我找到了匹配项,我仍会继续检查平等,但这是用numpy矢量化的,numpy通常比尽快停止更快。

If I understand your question, you are looking for something like

samples = np.empty((10, 2), dtype=int)
samples[:,0] = np.random.choice(a, 10)
samples[:,1] = np.random.choice(b, 10)
new_indices = (samples != edges[:,None]).any(axis=2).all(axis=0)
new_samples = samples[new_indices]

Meaning I generate 10 new samples, then I look whether they match edges. This is not optimal in operation number, as I continue checking for equality even after I found a match, but this is vectorized with numpy, which is usually faster than stopping as soon as you can.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文