压缩 numpy 数组的有效方法(python)

发布于 2024-08-14 05:09:56 字数 640 浏览 3 评论 0原文

我正在寻找一种有效的方法来压缩 numpy 数组。 我有一个像这样的数组: dtype=[(name, (np.str_,8), (job, (np.str_,8), (venue, np.uint32)] (我最喜欢的例子)。

如果我正在做这样的事情: my_array.compress(my_array['venue'] > 10000) 我会得到一个只有收入 > 10000 的新数组,而且速度很快但

如果我想过滤列表中的作业:它不起作用!

my__array.compress(m_y_array['job'] in ['this', 'that'])

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

所以我必须做这样的事情:

np.array([x for x in my_array if x['job'] in ['this', 'that'])

这既丑陋又低效!

错误

I am looking for an efficient way to compress a numpy array.
I have an array like: dtype=[(name, (np.str_,8), (job, (np.str_,8), (income, np.uint32)] (my favourite example).

if I'm doing something like this: my_array.compress(my_array['income'] > 10000) I'm getting a new array with only incomes > 10000, and it's quite quick.

But if I would like to filter jobs in list: it doesn't work!

my__array.compress(m_y_array['job'] in ['this', 'that'])

Error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

So I have to do something like this:

np.array([x for x in my_array if x['job'] in ['this', 'that'])

This is both ugly and inefficient!

Do you have an idea to make it efficient?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

面犯桃花 2024-08-21 05:09:56

它并不像你想要的那么好,但我认为你可以这样做:

mask = my_array['job'] == 'this'
for condition in ['that', 'other']:
  mask = numpy.logical_or(mask,my_array['job'] == condition)
selected_array = my_array[mask]

It's not quite as nice as what you'd like, but I think you can do:

mask = my_array['job'] == 'this'
for condition in ['that', 'other']:
  mask = numpy.logical_or(mask,my_array['job'] == condition)
selected_array = my_array[mask]
猫弦 2024-08-21 05:09:56

压缩 numpy 数组的最佳方法是使用 pytables。在处理大量数值数据时,它是事实上的标准。

import tables as t
hdf5_file = t.openFile('outfile.hdf5')
hdf5_file.createArray ......
hdf5_file.close()

The best way to compress a numpy array is to use pytables. It is the defacto standard when it comes to handling a large amount of numerical data.

import tables as t
hdf5_file = t.openFile('outfile.hdf5')
hdf5_file.createArray ......
hdf5_file.close()
小鸟爱天空丶 2024-08-21 05:09:56

如果您正在寻找仅 numpy 的解决方案,我认为您不会得到它。尽管如此,尽管它在幕后做了很多工作,但请考虑 tabular 包是否可能是能够以不那么“丑陋”的方式做你想做的事。我不确定如果不自己编写 C 扩展,您是否会变得更加“高效”。

顺便说一句,我认为对于任何实际情况来说,这都足够高效且足够漂亮。

my_array.compress([x in ['this', 'that'] for x in my_array['job']])

作为使这个不那么丑陋和更高效的额外步骤,你可能不会在中间有一个硬编码的列表,所以我会使用一个集合来代替,因为如果列表有多个,那么搜索比列表快得多items:

job_set = set(['this', 'that'])
my_array.compress([x in job_set for x in my_array['job']])

如果您认为这不够高效,我建议您进行基准测试,这样您就会有信心在尝试提高效率时明智地花费时间。

If you're looking for a numpy-only solution, I don't think you'll get it. Still, although it does lots of work under the covers, consider whether the tabular package might be able to do what you want in a less "ugly" fashion. I'm not sure you'll get more "efficient" without writing a C extension yourself.

By the way, I think this is both efficient enough and pretty enough for just about any real case.

my_array.compress([x in ['this', 'that'] for x in my_array['job']])

As an extra step in making this less ugly and more efficient, you would presumably not have a hardcoded list in the middle, so I would use a set instead, as it's much faster to search than a list if the list has more than a few items:

job_set = set(['this', 'that'])
my_array.compress([x in job_set for x in my_array['job']])

If you don't think this is efficient enough, I'd advise benchmarking so you'll have confidence that you're spending your time wisely as you try to make it even more efficient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文