有没有办法对 NumPy 数组中具有相同值的所有行应用函数?
假设我们有一个矩阵 A,它具有以下值:
In [2]: A
Out[2]:
array([[1, 1, 3],
[1, 1, 5],
[1, 1, 7],
[1, 2, 3],
[1, 2, 9],
[2, 1, 5],
[2, 2, 1],
[2, 2, 8],
[2, 2, 3]])
有没有办法对第三列的值逐行应用函数,例如 np.mean,其中第一个和第二个第二列相等,即得到矩阵 B:
In [4]: B
Out[4]:
array([[1, 1, 5],
[1, 2, 6],
[2, 1, 5],
[2, 2, 4]])
我的实际用例要复杂得多。我有一个大约 1M 行和 4 列的大矩阵。前三列对应于点云中点的 (x, y, z) 坐标,第四列是某个函数 f 的值,其中 f = f(x, y, z)。我必须对所有相等的 (y, z) 对沿 x 轴(矩阵中的第一列)执行积分。我最终必须得到一个具有一定行数的矩阵,该矩阵对应于唯一 (y, z) 对的数量和三列:y 轴、z 轴以及从积分获得的值。我有一些想法,但所有这些想法都包括多个 for 循环和潜在的内存问题。
有什么方法可以以矢量化方式执行此操作吗?
Let's say we have a matrix, A, that has the following values:
In [2]: A
Out[2]:
array([[1, 1, 3],
[1, 1, 5],
[1, 1, 7],
[1, 2, 3],
[1, 2, 9],
[2, 1, 5],
[2, 2, 1],
[2, 2, 8],
[2, 2, 3]])
is there a way to apply a function, e.g., np.mean
, row-wise for values of the third column where first and the second column are equal, i.e, to get matrix B:
In [4]: B
Out[4]:
array([[1, 1, 5],
[1, 2, 6],
[2, 1, 5],
[2, 2, 4]])
My actual use case is much more complex. I have a large matrix with ~1M rows and 4 columns. The first three columns correspond to (x, y, z) coordinate of a point in a point-cloud and the forth column is a value of some function f where f = f(x, y, z). I have to perform integration along x-axis (the first column in a matrix) for all (y, z) pairs that are equal. I have to end up with a matrix with some number of rows which corresponds to the number of unique (y, z) pairs and three columns: y-axis, z-axis, and the value that is obtained from integration. I have a few ideas but all those ideas include multiple for-loops and potential memory issues.
Is there any way to perform this in a vectorized fashion?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果你有很多数据,你可以使用pandas:
输出:
我认为这是最快的方法
you can use
pandas
, if you have a lot of data :output:
I assume this is being the fastest way
一个可能的解决方案:
输出是
还有一个更紧凑的变体,但可读性可能较差:
A possible solution:
The output is
There is also a more compact variation, but perhaps with less readability:
这可能就是您正在寻找的。您可以使用
A[:, n]
访问列。This might be what you're looking for. You can use
A[:, n]
to access a column.numpy
没有内置的分组
工具。由于各组的长度不同,因此它们需要单独的mean
调用。因此需要一定程度的迭代。defaultdict
是一种对值进行分组的便捷方法我们可以使用以下方法创建一个均值数组:
一些比较时间 - 通常需要注意缩放到更大的数组。
另一个需要更多工作才能完全发挥作用但在扩展时有希望的想法是:
numpy
does not have builtingrouping
tools. And since the groups differ in length, they will require separatemean
calls. So some level of iteration will be required.defaultdict
is a handy way of grouping valuesWe can create an array of the means with:
Some comparative times - with the usual caveat about scaling to larger arrays.
Another idea that needs more work to be fully functional, but has promise when scaling up is: