压缩 numpy 数组的有效方法(python)
我正在寻找一种有效的方法来压缩 numpy 数组。 我有一个像这样的数组: dtype=[(name, (np.str_,8), (job, (np.str_,8), (venue, np.uint32)]
(我最喜欢的例子)。
如果我正在做这样的事情: my_array.compress(my_array['venue'] > 10000)
我会得到一个只有收入 > 10000 的新数组,而且速度很快但
如果我想过滤列表中的作业:它不起作用!
my__array.compress(m_y_array['job'] in ['this', 'that'])
:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
所以我必须做这样的事情:
np.array([x for x in my_array if x['job'] in ['this', 'that'])
这既丑陋又低效!
错误
I am looking for an efficient way to compress a numpy array.
I have an array like: dtype=[(name, (np.str_,8), (job, (np.str_,8), (income, np.uint32)]
(my favourite example).
if I'm doing something like this: my_array.compress(my_array['income'] > 10000)
I'm getting a new array with only incomes > 10000, and it's quite quick.
But if I would like to filter jobs in list: it doesn't work!
my__array.compress(m_y_array['job'] in ['this', 'that'])
Error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
So I have to do something like this:
np.array([x for x in my_array if x['job'] in ['this', 'that'])
This is both ugly and inefficient!
Do you have an idea to make it efficient?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
它并不像你想要的那么好,但我认为你可以这样做:
It's not quite as nice as what you'd like, but I think you can do:
压缩 numpy 数组的最佳方法是使用 pytables。在处理大量数值数据时,它是事实上的标准。
The best way to compress a numpy array is to use pytables. It is the defacto standard when it comes to handling a large amount of numerical data.
如果您正在寻找仅 numpy 的解决方案,我认为您不会得到它。尽管如此,尽管它在幕后做了很多工作,但请考虑 tabular 包是否可能是能够以不那么“丑陋”的方式做你想做的事。我不确定如果不自己编写 C 扩展,您是否会变得更加“高效”。
顺便说一句,我认为对于任何实际情况来说,这都足够高效且足够漂亮。
作为使这个不那么丑陋和更高效的额外步骤,你可能不会在中间有一个硬编码的列表,所以我会使用一个集合来代替,因为如果列表有多个,那么搜索比列表快得多items:
如果您认为这不够高效,我建议您进行基准测试,这样您就会有信心在尝试提高效率时明智地花费时间。
If you're looking for a numpy-only solution, I don't think you'll get it. Still, although it does lots of work under the covers, consider whether the tabular package might be able to do what you want in a less "ugly" fashion. I'm not sure you'll get more "efficient" without writing a C extension yourself.
By the way, I think this is both efficient enough and pretty enough for just about any real case.
As an extra step in making this less ugly and more efficient, you would presumably not have a hardcoded list in the middle, so I would use a set instead, as it's much faster to search than a list if the list has more than a few items:
If you don't think this is efficient enough, I'd advise benchmarking so you'll have confidence that you're spending your time wisely as you try to make it even more efficient.