当前位置：文江博客话题详情

sse c++ c simd

按列主顺序重新排序 3D 矢量三元组的速度很慢

发布于 2024-12-12 17:24:09 字数 687 浏览 3 评论 0 原文

我有很多 (x1,y1,z1),(x2,y2,z2),(x3,y3,z3) 单精度向量三元组，我想对它们重新排序，所以 (x1,y1,z1),(x2,y2,z2),(x3,y3,z3) 变成 (x1,x2,x3,0,y1,y2,y3,0,z1,z2,z3,0)

目标是为基于 SSE 的计算准备数据集。我有以下代码来执行此操作：

for (int i=0;i<count;i++)
{
    Vect3F p0 = get_first_point(i);
    Vect3F p1 = get_second_point(i);
    Vect3F p2 = get_third_point(i);
    int idx = i*3;
    scratch[idx] = Vec4F(p0.x, p1.x, p2.x, 0); // These 3 rows are the slowest
    scratch[idx+1] = Vec4F(p0.y, p1.y, p2.y, 0);
    scratch[idx+2] = Vec4F(p0.z, p1.z, p2.z, 0);
}

循环的最后 3 行非常慢，它们占用了整个算法 90% 的时间！

正常吗？我可以让这种洗牌更快吗？（scratch是一个静态变量，并且是16对齐的。该函数被频繁调用，所以我认为scratch的块不应该从缓存中消失。）

原文

I'm having a lots of (x1,y1,z1),(x2,y2,z2),(x3,y3,z3) single precision vector triplets, and I want to reorder them, so
(x1,y1,z1),(x2,y2,z2),(x3,y3,z3)
becomes
(x1,x2,x3,0,y1,y2,y3,0,z1,z2,z3,0)

The goal is to prepere the dataset for an SSE based calculation. I have the following code to do this:

for (int i=0;i<count;i++)
{
    Vect3F p0 = get_first_point(i);
    Vect3F p1 = get_second_point(i);
    Vect3F p2 = get_third_point(i);
    int idx = i*3;
    scratch[idx] = Vec4F(p0.x, p1.x, p2.x, 0); // These 3 rows are the slowest
    scratch[idx+1] = Vec4F(p0.y, p1.y, p2.y, 0);
    scratch[idx+2] = Vec4F(p0.z, p1.z, p2.z, 0);
}

The last 3 rows of the loop are extremely slow, they take 90% percent of the time of my entire algorithm!

Is it normal? Can I make such shuffleing faster?
(scratch is a static variable, and is 16-aligned. The function is called frequently, so I think the blocks of scratch should not disappear from the cache.)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝戈者 2024-12-19 17:24:09

首先，您不应该创建 3 个临时矢量对象。
而不是：

tri = triangles[i];
Vect3F p0 = points[indices[tri]];
Vect3F p1 = points[indices[tri+1]];
Vect3F p2 = points[indices[tri+2]];

您应该只使用 memcpy() 复制数据；创建一个适用于整个集合的循环并复制原始数据。这是我能想到的最快的方法。

使用 3 个变量会运行很多构造函数，速度非常慢。出于同样的原因，第二种方法（来自评论）也好不了多少。

First of all, you shouln't create 3 temporary vector objects.
Instead of:

tri = triangles[i];
Vect3F p0 = points[indices[tri]];
Vect3F p1 = points[indices[tri+1]];
Vect3F p2 = points[indices[tri+2]];

You should just copy data using memcpy(); Make a loop that goes for your entire collection and copies raw data. It is the fastest way I can think of.

Using 3 variables runs a lot of constructors which are painfully slow. The second way (from comment) isn't much better for the same reason.

回复收藏 0 原文

~没有更多了~

关于作者

忆梦

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

按列主顺序重新排序 3D 矢量三元组的速度很慢

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

按列主顺序重新排序 3D 矢量三元组的速度很慢

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。