使用 Eigen 的性能比使用我自己的类更差

发布于 2024-11-10 18:07:58 字数 2479 浏览 2 评论 0原文

有人告诉我，为了提高程序的性能，我应该使用一些专门的矩阵类而不是我自己的类。

StackOverflow 用户推荐：

uBLAS
EIGEN
BLAS

起初我想使用 uBLAS 但是阅读文档事实证明，这个库不支持矩阵-矩阵乘法。

毕竟我决定使用 EIGEN 库。因此，我将矩阵类交换为 Eigen::MatrixXd - 但事实证明，现在我的应用程序运行速度比以前还要慢。使用 EIGEN 之前的时间为 68 秒，将我的矩阵类交换为 EIGEN 矩阵程序后运行了 87 秒。

程序中花费最多时间的部分如下所示

TemplateClusterBase* TemplateClusterBase::TransformTemplateOne( vector<Eigen::MatrixXd*>& pointVector, Eigen::MatrixXd& rotation ,Eigen::MatrixXd& scale,Eigen::MatrixXd& translation )
{   
    for (int i=0;i<pointVector.size();i++ )
    {
        //Eigen::MatrixXd outcome =
        Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i])  + translation;
        //delete  prototypePointVector[i];      // ((rotation*scale)* (*prototypePointVector[i])  + translation).ConvertToPoint();
        MatrixHelper::SetX(*prototypePointVector[i],MatrixHelper::GetX(outcome));
        MatrixHelper::SetY(*prototypePointVector[i],MatrixHelper::GetY(outcome));
        //assosiatedPointIndexVector[i]    = prototypePointVector[i]->associatedTemplateIndex = i;
    }

    return this;
}

，正如

Eigen::MatrixXd AlgorithmPointBased::UpdateTranslationMatrix( int clusterIndex )
{
    double membershipSum = 0,outcome = 0;
    double currentPower = 0;
    Eigen::MatrixXd outcomePoint = Eigen::MatrixXd(2,1);
    outcomePoint << 0,0;
    Eigen::MatrixXd templatePoint;
    for (int i=0;i< imageDataVector.size();i++)
    {
        currentPower =0; 
        membershipSum += currentPower = pow(membershipMatrix[clusterIndex][i],m);
        outcomePoint.noalias() +=  (*imageDataVector[i] - (prototypeVector[clusterIndex]->rotationMatrix*prototypeVector[clusterIndex]->scalingMatrix* ( *templateCluster->templatePointVector[prototypeVector[clusterIndex]->assosiatedPointIndexVector[i]]) ))*currentPower ;
    }

    outcomePoint.noalias() = outcomePoint/=membershipSum;
    return outcomePoint; //.ConvertToMatrix();
}

您所看到的，这些函数执行大量矩阵运算。这就是为什么我认为使用 Eigen 会加快我的应用程序速度。不幸的是（正如我上面提到的），该程序运行速度较慢。

有什么办法可以加速这些功能吗？

也许如果我使用 DirectX 矩阵运算我会获得更好的性能？（但是我有一台带集成显卡的笔记本电脑）。

原文

A couple of weeks ago I asked a question about the performance of matrix multiplication.

I was told that in order to enhance the performance of my program I should use some specialised matrix classes rather than my own class.

StackOverflow users recommended:

uBLAS
EIGEN
BLAS

At first I wanted to use uBLAS however reading documentation it turned out that this library doesn't support matrix-matrix multiplication.

After all I decided to use EIGEN library. So I exchanged my matrix class to Eigen::MatrixXd - however it turned out that now my application works even slower than before.
Time before using EIGEN was 68 seconds and after exchanging my matrix class to EIGEN matrix program runs for 87 seconds.

Parts of program which take the most time looks like that

TemplateClusterBase* TemplateClusterBase::TransformTemplateOne( vector<Eigen::MatrixXd*>& pointVector, Eigen::MatrixXd& rotation ,Eigen::MatrixXd& scale,Eigen::MatrixXd& translation )
{   
    for (int i=0;i<pointVector.size();i++ )
    {
        //Eigen::MatrixXd outcome =
        Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i])  + translation;
        //delete  prototypePointVector[i];      // ((rotation*scale)* (*prototypePointVector[i])  + translation).ConvertToPoint();
        MatrixHelper::SetX(*prototypePointVector[i],MatrixHelper::GetX(outcome));
        MatrixHelper::SetY(*prototypePointVector[i],MatrixHelper::GetY(outcome));
        //assosiatedPointIndexVector[i]    = prototypePointVector[i]->associatedTemplateIndex = i;
    }

    return this;
}

and

Eigen::MatrixXd AlgorithmPointBased::UpdateTranslationMatrix( int clusterIndex )
{
    double membershipSum = 0,outcome = 0;
    double currentPower = 0;
    Eigen::MatrixXd outcomePoint = Eigen::MatrixXd(2,1);
    outcomePoint << 0,0;
    Eigen::MatrixXd templatePoint;
    for (int i=0;i< imageDataVector.size();i++)
    {
        currentPower =0; 
        membershipSum += currentPower = pow(membershipMatrix[clusterIndex][i],m);
        outcomePoint.noalias() +=  (*imageDataVector[i] - (prototypeVector[clusterIndex]->rotationMatrix*prototypeVector[clusterIndex]->scalingMatrix* ( *templateCluster->templatePointVector[prototypeVector[clusterIndex]->assosiatedPointIndexVector[i]]) ))*currentPower ;
    }

    outcomePoint.noalias() = outcomePoint/=membershipSum;
    return outcomePoint; //.ConvertToMatrix();
}

As You can see, these functions performs a lot of matrix operations. That is why I thought using Eigen would speed up my application. Unfortunately (as I mentioned above), the program works slower.

Is there any way to speed up these functions?

Maybe if I used DirectX matrix operations I would get better performance ?? (however I have a laptop with integrated graphic card).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

熟人话多 2024-11-17 18:07:58

确保打开编译器优化（例如，gcc 上至少为 -O2）。 Eigen 是高度模板化的，如果不打开优化，性能不会很好。

回复收藏 0 原文

不必在意 2024-11-17 18:07:58

如果您使用 Eigen 的 MatrixXd 类型，那么它们的大小是动态调整的。使用固定大小类型（例如 Matrix4d、Vector4d），您应该会获得更好的结果。

另外，请确保您进行的编译使代码可以矢量化；请参阅相关 Eigen 文档。

关于使用 Direct3D 扩展库内容（D3DXMATRIX 等）的想法：对于图形几何（4x4 变换等）来说它还可以（如果有点老式），但它肯定不是 GPU 加速的（我认为只是很好的旧 SSE）。另外，请注意，它只是浮点精度（您似乎设置为使用双精度）。就我个人而言，我更喜欢使用 Eigen，除非我实际上正在编写 Direct3D 应用程序。

回复收藏 0 原文

撩人痒 2024-11-17 18:07:58

您使用的是哪个版本的 Eigen？他们最近发布了 3.0.1，应该比 2.x 更快。另外，请确保稍微使用一下编译器选项。例如，确保 Visual Studio 中正在使用 SSE：

C/C++ -->代码生成-->启用增强指令集

回复收藏 0 原文

强辩 2024-11-17 18:07:58

您应该首先分析并优化算法，然后优化实现。特别是，发布的代码效率很低：

for (int i=0;i<pointVector.size();i++ )
{
   Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i])  + translation;

我不知道这个库，所以我什至不会尝试猜测您正在创建的不必要的临时对象的数量，但一个简单的重构：

Eigen::MatrixXd tmp = rotation*scale;
for (int i=0;i<pointVector.size();i++ )
{
   Eigen::MatrixXd outcome = tmp*(*pointVector[i])  + translation;

可以为您节省大量的 < em>昂贵的乘法（同样，可能会立即丢弃新的临时矩阵。

You should profile and then optimize first the algorithm, then the implementation. In particular, the posted code is quite innefficient:

for (int i=0;i<pointVector.size();i++ )
{
   Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i])  + translation;

I don't know the library, so I won't even try to guess the number of unnecessary temporaries that you are creating, but a simple refactor:

Eigen::MatrixXd tmp = rotation*scale;
for (int i=0;i<pointVector.size();i++ )
{
   Eigen::MatrixXd outcome = tmp*(*pointVector[i])  + translation;

Can save you a good amount of expensive multiplications (and again, probably new temporary matrices that get discarded right away.

回复收藏 0 原文

奢欲 2024-11-17 18:07:58

有几点。

当该乘积每次迭代具有相同的值时，为什么要在循环内乘以旋转*缩放？这会浪费很多精力。
您正在使用动态大小的矩阵而不是固定大小的矩阵。其他人已经提到过这一点，你说你节省了 2 秒。
您将参数作为指向矩阵的指针向量传递。这会增加额外的指针间接寻址并破坏数据局部性的任何保证，从而导致缓存性能较差。
我希望这不是侮辱，但是你是在发布版还是调试版中编译？ Eigen 在调试构建中非常慢，因为它使用了许多琐碎的模板化函数，这些函数在发布时进行了优化，但仍处于调试状态。

看看你的代码，我不愿意将性能问题归咎于 Eigen。然而，大多数线性代数库（包括 Eigen）并不是真正为您的大量微小矩阵的用例而设计的。一般来说，Eigen 对于 100x100 或更大的矩阵会得到更好的优化。使用自己的矩阵类或 DirectX 数学帮助器类可能会更好。 DirectX 数学类完全独立于您的显卡。

回复收藏 0 原文