uBLAS 慢速矩阵稀疏向量乘法

发布于 2024-11-15 10:46:29 字数 1539 浏览 4 评论 0原文

我正在转换一些我自己的矢量代数代码以使用优化的 boost uBLAS 库。然而，当我尝试进行 SymmetricMatrix-SparseVector 乘法时，我发现它比我自己的实现慢大约 4 倍。向量大小通常在 0-500 左右，大约 70-80% 的条目为零。

这是我的代码

void CRoutines::GetA(double a[], double vectorIn[], int sparseVectorIndexes[], int vectorLength, int sparseLength)
{
    compressed_vector<double> inVec (vectorLength, sparseLength);
    for(int i = 0; i < sparseLength; i++)
    {
        inVec(sparseVectorIndexes[i]) = vectorIn[sparseVectorIndexes[i]];
    }
    vector<double> test = prod(inVec, matrix);
        for(int i = 0; i < vectorLength; i++)
    {
        a[i] = test(i);
    }
}

稀疏向量索引存储输入向量的非零值的索引，向量长度是向量的长度，稀疏长度是向量中非零值的数量。该矩阵存储为对称矩阵symmetry_matrix。

我自己的实现是一个简单的嵌套循环迭代，其中矩阵只是一个 2D 双精度数组：

void CRoutines::GetA(double a[], double vectorIn[], int sparseVectorIndexes[], int vectorLength, int sparseLength)
 {
    for (int i = 0; i < vectorLength; i++)
    {
            double temp = 0;

            for (int j = 0; j < sparseLength; j++)
            {
                int row = sparseVectorIndexes[j];
                if (row <= i) // Handle lower triangular sparseness
                    temp += matrix[i][row] * vectorIn[row];
                else
                    temp += matrix[row][i] * vectorIn[row];
            }
            a[i] = temp;
    }

}

为什么 uBLAS 慢 4 倍？是不是我写的乘法写得不好？或者还有另一个更适合这个的库吗？

编辑：如果我使用密集向量数组，那么 uBLAS 只会慢 2 倍......

原文

I'm converting some of my own vector algebra code to use the optimized boost uBLAS library. However, when I tried to do a SymmetricMatrix-SparseVector multiplication I found it to be about 4x slower than my own implementation. The vector size is usually around 0-500 and about 70-80% entries are zero.

Here is my code

void CRoutines::GetA(double a[], double vectorIn[], int sparseVectorIndexes[], int vectorLength, int sparseLength)
{
    compressed_vector<double> inVec (vectorLength, sparseLength);
    for(int i = 0; i < sparseLength; i++)
    {
        inVec(sparseVectorIndexes[i]) = vectorIn[sparseVectorIndexes[i]];
    }
    vector<double> test = prod(inVec, matrix);
        for(int i = 0; i < vectorLength; i++)
    {
        a[i] = test(i);
    }
}

sparseVectorIndexes stores the indexes of the non-zero values of the input vector, vectorLength is the length of the vector, and sparseLength is the number of non-zeros in the vector. The matrix is stored as a symmetric matrix symmetric_matrix<double, lower>.

My own implementation is a simple nested loop iteration where matrix is just a 2D double array:

void CRoutines::GetA(double a[], double vectorIn[], int sparseVectorIndexes[], int vectorLength, int sparseLength)
 {
    for (int i = 0; i < vectorLength; i++)
    {
            double temp = 0;

            for (int j = 0; j < sparseLength; j++)
            {
                int row = sparseVectorIndexes[j];
                if (row <= i) // Handle lower triangular sparseness
                    temp += matrix[i][row] * vectorIn[row];
                else
                    temp += matrix[row][i] * vectorIn[row];
            }
            a[i] = temp;
    }

}

Why is uBLAS 4x slower? Am I not writing the multiplication properly? Or is there another library more suited to this?

EDIT: If I use a dense vector array instead then uBLAS is only 2x slower...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

策马西风 2024-11-22 10:46:29

uBlas 的设计并未将性能作为第一目标。有些库比 uBlas 快得多。参见例如 http://eigen.tuxfamily.org/index.php?title=Benchmark

回复收藏 0 原文

度的依靠╰つ 2024-11-22 10:46:29

此 pdf 对各种线性代数库进行了相当详细的比较。我在这个答案中发现了这个href="https://scicomp.stackexchange.com/">Computational Science Stack Exchange，这可能是解决此类问题的更好地方。

回复收藏 0 原文

苦妄 2024-11-22 10:46:29

不确定这是否是速度减慢的原因（您是否进行了分析以获取 4x 数字？），但此循环可能会很慢：

for(int i = 0; i < vectorLength; i++)
    {
        a[i] = test(i);
    }

如果大部分时间都花在处理代码中的循环上，那么这个额外的循环可能会使时间加倍（与ublas无关）。我建议使用 std::copy 代替：

std::copy(test.begin(), test.end(), a[0])

大多数编译器应该看到这是复制双精度并执行最佳复制，这可能会在一定程度上解决您的问题。

Not sure if it is the cause of the slowdown (did you profile to get your 4x number?) but this loop could be slow:

for(int i = 0; i < vectorLength; i++)
    {
        a[i] = test(i);
    }

If most of the time is spent processing the loops in your code then this extra loop could double the time (and have nothing to do with ublas). I would recommend using std::copy instead:

std::copy(test.begin(), test.end(), a[0])

Most compilers should see that this is copying a double and do an optimal copy, which might fix your problem somewhat.

回复收藏 0 原文

~没有更多了~

关于作者

所谓喜欢

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

uBLAS 慢速矩阵稀疏向量乘法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

uBLAS 慢速矩阵稀疏向量乘法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。