对于特定于实现的优化,您可以执行大量缓存。在大型矩阵乘法中,大量时间花费在内存和 CPU 之间的数据传输上。因此,CPU 设计者实现了一种智能缓存系统,将最近使用的内存存储在称为缓存的小内存部分中。除此之外,他们还使得附近内存也被缓存。这是因为大量内存 IO 是由于从数组读取/写入数组而产生的,而数组是顺序存储的。
由于矩阵的转置只是交换索引的同一矩阵,因此在矩阵中缓存值可能会产生两倍以上的影响。
As of right now there aren't any aymptotic barrier-breaking properties of this particular multiplication.
The obvious optimization is to take advantage of the symmetry of the product. That is to say, the [i][j]th entry is equal to the [j][i]th entry.
For implementation-specific optimizations, there is a significant amount of caching that you can do. A very significant amount of time in the multiplication of large matrices is spent transferring data to and from memory and CPU. So CPU designers implemented a smart caching system whereby recently used memory is stored in a small memory section called the cache. In addition to that, they also made it so that nearby memory is also cached. This is because a lot of the memory IO is due to reading/writing from/to arrays, which are stored sequentially.
Since the transpose of a matrix is simply the same matrix with the indices swapped, caching a value in the matrix can have over twice the impact.
发布评论
评论(1)
截至目前,这种特定乘法尚不存在任何渐近屏障破坏特性。
明显的优化是利用产品的对称性。也就是说,第
[i][j]
条目等于第[j][i]
条目。对于特定于实现的优化,您可以执行大量缓存。在大型矩阵乘法中,大量时间花费在内存和 CPU 之间的数据传输上。因此,CPU 设计者实现了一种智能缓存系统,将最近使用的内存存储在称为缓存的小内存部分中。除此之外,他们还使得附近内存也被缓存。这是因为大量内存 IO 是由于从数组读取/写入数组而产生的,而数组是顺序存储的。
由于矩阵的转置只是交换索引的同一矩阵,因此在矩阵中缓存值可能会产生两倍以上的影响。
As of right now there aren't any aymptotic barrier-breaking properties of this particular multiplication.
The obvious optimization is to take advantage of the symmetry of the product. That is to say, the
[i][j]
th entry is equal to the[j][i]
th entry.For implementation-specific optimizations, there is a significant amount of caching that you can do. A very significant amount of time in the multiplication of large matrices is spent transferring data to and from memory and CPU. So CPU designers implemented a smart caching system whereby recently used memory is stored in a small memory section called the cache. In addition to that, they also made it so that nearby memory is also cached. This is because a lot of the memory IO is due to reading/writing from/to arrays, which are stored sequentially.
Since the transpose of a matrix is simply the same matrix with the indices swapped, caching a value in the matrix can have over twice the impact.