性能问题:就地反转指针数组与值数组

发布于 2024-10-11 06:22:59 字数 680 浏览 2 评论 0原文

提出这个问题的背景是我正在求解线性方程组 (Ax=b),其中 A 是矩阵(通常尺寸小于 100x100),x 和 b 是向量。我使用的是直接法,这意味着我首先反转 A,然后通过 x=A^(-1)b 求解。在迭代过程中重复该步骤直至收敛。

我现在使用矩阵库(MTL4)的方式:
对于每次迭代,我将 A (值)的所有系数复制到矩阵对象中,然后反转。这是最简单、最安全的选择。

改用指针数组:
对于我的特定情况,A 的系数恰好在每次迭代之间更新。这些系数存储在不同的变量中(有些是数组,有些不是)。如果我将 A 设置为包含指向这些系数变量的指针的数组,然后就地反转 A,是否会有潜在的性能提升?

最后一个选项的好处是,一旦我在第一次迭代之前在 A 中设置了指针,我就不需要在连续迭代之间复制任何值。 A 中指向的值将在迭代之间自动更新。

因此,在我看来,性能问题归结为:
- 假设指针的取消引用并不昂贵,则矩阵求逆过程花费的时间大致相同。
- 指针数组不需要额外的内存来存储包含值的矩阵 A。
- 指针数组选项不必在每次迭代之间复制 A 的所有 NxN 值。
- 指向指针数组选项的值通常在内存中不排序。希望所有值在内存中都相对较近,但 *A[0][1] 通常不在 *A[0][0] 旁边,等等。

对此有什么评论吗?最后一句话是否会对绩效产生负面影响,从而权衡积极的绩效影响?

The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence.

The way I'm doing it now, using a matrix library (MTL4):
For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option.

Using an array of pointers instead:
For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place?

The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations.

So the performance question boils down to this, as I see it:
- The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive.
- The array of pointers does not need the extra memory for matrix A containing values.
- The array of pointers option does not have to copy all NxN values of A between each iteration.
- The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc.

Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

伪装你 2024-10-18 06:22:59

测试,测试,测试。

尤其是在数值线性代数领域。有许多效果在发挥作用,这就是为什么有许多优化库可以为您解决这个负担。

需要考虑的一些影响:

  • 内存局部性和缓存影响
  • 多线程影响(一些在单核运行时最佳的算法,在使用多个核心时会导致内存冲突/缓存驱逐)。

测试是无可替代的。

Test, test, test.

Especially in the field of Numerical Linear Algebra. There are many effects in play, which is why there is a number of optimized libraries that have solved that burden for you.

Some effects to consider:

  • Memory locality and cache effects
  • Multithreading effects (some algorithms that are optimal while running single-core, cause memory collision/cache eviction when more than one core is utilized).

There is no substitute for testing.

伴我心暖 2024-10-18 06:22:59

以下是一些评论:

  • 用于反转的函数是否能够处理指针矩阵而不是值矩阵?如果它没有意识到必须进行间接寻址,则可能会发生各种奇怪的效果。
  • 当进行就地矩阵求逆(意味着求逆矩阵覆盖输入矩阵)时,所有输入系数都将被新值覆盖,因为矩阵求逆不能通过重新排序元素来完成矩阵。
  • 在反演过程中,外部过程不会改变任何输入系数。所有此类更新都必须在迭代之间执行。

因此,当您选择指针解决方案时,您会遇到以下一组权衡:

  • 组成矩阵 A 的系数不能再与矩阵求逆异步计算。
  • 每次迭代都必须重新计算所有系数(当您使用就地求逆时,意味着求逆矩阵使用与输入矩阵相同的内存),或者您仍然必须使用 N x 矩阵用于保存反转结果的 N 个值。

Here are some comments:

  • Is the function you use for the inversion capable of handling a matrix of pointers instead of values? If it does not realise it has to do an indirection, all kinds of strange effects could happen.
  • When doing an in-place matrix inversion (meaning the inverted matrix overwrites the input matrix), all input coefficients will get overwritten with new values, because matrix inversion can not be done by re-ordering the elements of the matrix.
  • During the inversion process, none of the input coefficients may be changed by an outside process. All such updates have to be performed between iterations.

So, you get the following set of trade-offs when you chose the pointer solution:

  • The coefficients making up matrix A can no longer be calculated asynchronously with the matrix inversion.
  • Either all coefficients must be recalculated for each iteration (when you use in-place inversion, meaning the inverted matrix uses the same memory as the input matrix), or you still have to use a matrix of N x N values to hold the result of the inversion.
波浪屿的海角声 2024-10-18 06:22:59

您在这里得到了很好的答案。我唯一要补充的是一些有关性能的一般经验。

您正在先验地考虑性能。这是合理的,但真正的回报是事后的。换句话说,在运行的代码告诉您之前,您无法确定真正的优化机会在哪里。

您不知道大部分时间是否会花费在矩阵求逆、乘法、复制矩阵、取消引用或其他操作上。人们可以猜测。如果我必须猜的话,那就是矩阵求逆,因为它是 100x100。
然而,我无法猜测其他的东西可能会更大。
猜测的记录非常糟糕,尤其是当您可以找出时。

以下是我的意思的示例。

You're getting good answers here. The only thing I would add is some general experience with performance.

You are thinking about performance a-priori. That's reasonable, but the real payoff is a-posteriori. In other words, you don't know for certain where the real optimization opportunities are, until the running code tells you.

You don't know if the bulk of the time will be spent in matrix inversion, multiplication, copying the matrix, dereferencing, or what. People can guess. If I had to guess, it would be matrix inversion, because it's 100x100.
However, something else I can't guess might be even bigger.
Guessing has a very poor track record, especially when you can just find out.

Here's an example of what I mean.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文