性能问题:就地反转指针数组与值数组
提出这个问题的背景是我正在求解线性方程组 (Ax=b),其中 A 是矩阵(通常尺寸小于 100x100),x 和 b 是向量。我使用的是直接法,这意味着我首先反转 A,然后通过 x=A^(-1)b 求解。在迭代过程中重复该步骤直至收敛。
我现在使用矩阵库(MTL4)的方式:
对于每次迭代,我将 A (值)的所有系数复制到矩阵对象中,然后反转。这是最简单、最安全的选择。
改用指针数组:
对于我的特定情况,A 的系数恰好在每次迭代之间更新。这些系数存储在不同的变量中(有些是数组,有些不是)。如果我将 A 设置为包含指向这些系数变量的指针的数组,然后就地反转 A,是否会有潜在的性能提升?
最后一个选项的好处是,一旦我在第一次迭代之前在 A 中设置了指针,我就不需要在连续迭代之间复制任何值。 A 中指向的值将在迭代之间自动更新。
因此,在我看来,性能问题归结为:
- 假设指针的取消引用并不昂贵,则矩阵求逆过程花费的时间大致相同。
- 指针数组不需要额外的内存来存储包含值的矩阵 A。
- 指针数组选项不必在每次迭代之间复制 A 的所有 NxN 值。
- 指向指针数组选项的值通常在内存中不排序。希望所有值在内存中都相对较近,但 *A[0][1] 通常不在 *A[0][0] 旁边,等等。
对此有什么评论吗?最后一句话是否会对绩效产生负面影响,从而权衡积极的绩效影响?
The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence.
The way I'm doing it now, using a matrix library (MTL4):
For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option.
Using an array of pointers instead:
For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place?
The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations.
So the performance question boils down to this, as I see it:
- The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive.
- The array of pointers does not need the extra memory for matrix A containing values.
- The array of pointers option does not have to copy all NxN values of A between each iteration.
- The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc.
Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
测试,测试,测试。
尤其是在数值线性代数领域。有许多效果在发挥作用,这就是为什么有许多优化库可以为您解决这个负担。
需要考虑的一些影响:
测试是无可替代的。
Test, test, test.
Especially in the field of Numerical Linear Algebra. There are many effects in play, which is why there is a number of optimized libraries that have solved that burden for you.
Some effects to consider:
There is no substitute for testing.
以下是一些评论:
因此,当您选择指针解决方案时,您会遇到以下一组权衡:
Here are some comments:
So, you get the following set of trade-offs when you chose the pointer solution:
您在这里得到了很好的答案。我唯一要补充的是一些有关性能的一般经验。
您正在先验地考虑性能。这是合理的,但真正的回报是事后的。换句话说,在运行的代码告诉您之前,您无法确定真正的优化机会在哪里。
您不知道大部分时间是否会花费在矩阵求逆、乘法、复制矩阵、取消引用或其他操作上。人们可以猜测。如果我必须猜的话,那就是矩阵求逆,因为它是 100x100。
然而,我无法猜测其他的东西可能会更大。
猜测的记录非常糟糕,尤其是当您可以找出时。
以下是我的意思的示例。
You're getting good answers here. The only thing I would add is some general experience with performance.
You are thinking about performance a-priori. That's reasonable, but the real payoff is a-posteriori. In other words, you don't know for certain where the real optimization opportunities are, until the running code tells you.
You don't know if the bulk of the time will be spent in matrix inversion, multiplication, copying the matrix, dereferencing, or what. People can guess. If I had to guess, it would be matrix inversion, because it's 100x100.
However, something else I can't guess might be even bigger.
Guessing has a very poor track record, especially when you can just find out.
Here's an example of what I mean.