基于与外部数组交集的 numpy rearray 索引

发布于 2024-10-21 01:33:23 字数 930 浏览 6 评论 0原文

我试图根据 recarrays 字段之一和外部数组之间的通用值对 numpy.recarray 中的记录进行子集化。例如,

a = np.array([(10, 'Bob', 145.7), (20, 'Sue', 112.3), (10, 'Jim', 130.5)],
        dtype=[('id', 'i4'), ('name', 'S10'), ('weight', 'f8')])
a = a.view(np.recarray)

b = np.array([10,30])

我想取 a.id 和 b 的交集来确定从记录中提取哪些记录,以便我返回:

(10, 'Bob', 145.7)
(10, 'Jim', 130.5)

我天真地尝试过:

common = np.intersect1d(a.id, b)
subset = a[common]

但是这当然行不通,因为没有 a[10]。我还尝试通过在 id 字段和索引之间创建一个反向字典并从那里进行子集化来实现此目的,例如,

id_x_index = {}
ids = a.id
indexes = np.arange(a.size)
for (id, index) in zip(ids, indexes):
    id_x_index[id] = index

subset_indexes = np.sort([id_x_index[x] for x in ids if x in b])
print a[subset_indexes]

但是如果 a.id 有重复项,我将覆盖 id_x_index 中的字典值,在这种情况下我得到

(10, “吉姆”,130.5)
(10, 'Jim', 130.5)

我知道我忽略了一些简单的方法来将适当的索引放入重新数组中。感谢您的帮助。

I'm trying to subset the records in a numpy.recarray based on the common values between one of the recarrays fields and an external array. For example,

a = np.array([(10, 'Bob', 145.7), (20, 'Sue', 112.3), (10, 'Jim', 130.5)],
        dtype=[('id', 'i4'), ('name', 'S10'), ('weight', 'f8')])
a = a.view(np.recarray)

b = np.array([10,30])

I want to take the intersection of a.id and b to determine what records to pull from the recarray, so that I get back:

(10, 'Bob', 145.7)
(10, 'Jim', 130.5)

Naively, I tried:

common = np.intersect1d(a.id, b)
subset = a[common]

but of course that doesn't work because there is no a[10]. I also tried to do this by creating a reverse dict between the id field and the index and subsetted from there, e.g.

id_x_index = {}
ids = a.id
indexes = np.arange(a.size)
for (id, index) in zip(ids, indexes):
    id_x_index[id] = index

subset_indexes = np.sort([id_x_index[x] for x in ids if x in b])
print a[subset_indexes]

but then I'm overriding dict values in id_x_index if a.id has duplicates, as in this case I get

(10, 'Jim', 130.5)
(10, 'Jim', 130.5)

I know I'm overlooking some simple way to get the appropriate indices into the recarray. Thanks for help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

带上头具痛哭 2024-10-28 01:33:23

在 Numpy 中执行此操作的最简洁方法是

subset = a[np.in1d(a.id, b)]

The most concise way to do this in Numpy is

subset = a[np.in1d(a.id, b)]
一抹淡然 2024-10-28 01:33:23

对于那些拥有旧版本 numpy 的人,你也可以这样做:

subset = a[np.array([i in b for i in a.id])]

And for those who have an older version of numpy, you can also do it this way:

subset = a[np.array([i in b for i in a.id])]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文