点到体素映射的优化

发布于 2024-10-07 06:06:22 字数 946 浏览 0 评论 0原文

我使用探查器来查看一些运行速度不够快的代码。发现下面这个函数占用了大部分时间，而这个函数有一半的时间都花在了floor上。现在，有两种可能性：优化此函数或提高一级并减少对此函数的调用。我想知道第一个是否可能。

int Sph::gridIndex (Vector3 position) const {
    int mx = ((int)floor(position.x / _gridIntervalSize) % _gridSize);
    int my = ((int)floor(position.y / _gridIntervalSize) % _gridSize);
    int mz = ((int)floor(position.z / _gridIntervalSize) % _gridSize);

    if (mx < 0) {
        mx += _gridSize;
    }
    if (my < 0) {
        my += _gridSize;
    }
    if (mz < 0) {
        mz += _gridSize;
    }

    int x = mx * _gridSize * _gridSize;
    int y = my * _gridSize;
    int z = mz * 1;
    return x + y + z;
}

Vector3 只是一些简单的类，它存储三个浮点数并提供一些重载运算符。 _gridSize 是 int 类型，_gridIntervalSize 是 float 类型。有 _gridSize ^ 3 个桶。

该函数的目的是提供哈希表支持。每个 3d 点都映射到一个索引，并且位于大小为 _gridIntervalSize ^ 3 的同一体素中的点应该落在同一个桶中。

原文

I used a profiler to look over some code which does not yet run fast enough. It found that the following function took most of the time, and half of the time in this function was spent in floor. Now, there are two possibilities: optimizing this function or going one level above and reducing the calls to this function. I wonder, if the first one is possible.

int Sph::gridIndex (Vector3 position) const {
    int mx = ((int)floor(position.x / _gridIntervalSize) % _gridSize);
    int my = ((int)floor(position.y / _gridIntervalSize) % _gridSize);
    int mz = ((int)floor(position.z / _gridIntervalSize) % _gridSize);

    if (mx < 0) {
        mx += _gridSize;
    }
    if (my < 0) {
        my += _gridSize;
    }
    if (mz < 0) {
        mz += _gridSize;
    }

    int x = mx * _gridSize * _gridSize;
    int y = my * _gridSize;
    int z = mz * 1;
    return x + y + z;
}

Vector3 is just some simple class which stores three floats and provides some overloaded operators. _gridSize is of type int and _gridIntervalSize is a float. There are _gridSize ^ 3 buckets.

The purpose of the function is to provide hash table support. Every 3d-point is mapped to an index, and points which lie in the same voxel of size _gridIntervalSize ^ 3 should land in the same bucket.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

や三分注定 2024-10-14 06:06:22

当涉及数学时，优化的第一条规则是：消除除法、平方根和三角函数。

<代码>
inverse_size = 1 / _gridIntervalSize;
....应该只执行一次，而不是每次调用一次。

int mx = ((int)floor(position.x * inverse_size) % _gridSize);
int my = ((int)floor(position.y * inverse_size) % _gridSize);
int mz = ((int)floor(position.z * inverse_size) % _gridSize);

我还建议放弃 mod 操作，因为这是另一个划分 - 如果您的网格大小是 2 的幂，您可以使用 & (gridsize-1) 这还允许您删除底部的条件代码，这是另一个很大的节省。

另一方面，使用重载运算符可能会伤害您。这是一个敏感的话题，所以我会让你尝试一下并自己决定。

First rule of optimization when there is math involved: Eliminate division, square roots, and trig functions.

inverse_size = 1 / _gridIntervalSize; ....that should be done only once, not once per call.

int mx = ((int)floor(position.x * inverse_size) % _gridSize);
int my = ((int)floor(position.y * inverse_size) % _gridSize);
int mz = ((int)floor(position.z * inverse_size) % _gridSize);

I would also recommend dropping the mod operation because that's another division - if your grid size is a power of 2 you can use & (gridsize-1) which will also allow you to delete the conditional code at the bottom which is another big savings.

On another note, using overloaded operators may be hurting you. This is a touchy subject here so I'll let you experiment with it and decide for yourself.

回复收藏 0 原文

靑春怀旧 2024-10-14 06:06:22

我假设您使用 floor 因为负值是可能的，并且因为您不希望在转换为 int 时由于默认截断而出现异常（值从四舍五入到零）两侧，形成一些超大的体素）。

如果您可以为向量中的每个值指定一个安全的最负值，则可以在转换之前减去该（负）值，或者更确切地说是 _gridIntervalSize 的最接近的负数倍数，然后删除地板。

使用fmod可以确保您有一个安全的最负值，并替换整数%，但这可能是一种反优化。尽管如此，作为一个快速的改变，它可能值得检查。

另外，检查您的平台是否支持向量指令，以及是否可以轻松地鼓励您的编译器使用它们。 x86 芯片当然具有整数向量指令和浮点指令（首先是旧的 Pentium 1 MMX 指令），并且可能能够比“正常”CPU 指令集更有效地处理此问题。这甚至可能是为编译器挖掘向量指令内在函数列表并进行一些手动优化的情况。首先检查编译器可以为您做什么 - 我不确定这种优化编译器已经为您做了多少事情。

一项可能微不足道的微优化...

return (mx * _gridSize + my) * _gridSize + mz;

节省了一个整数乘法。当然，这是微不足道的，编译器可能会捕获它，但这是一个老习惯。

哦 - 注意前导下划线。这些是保留的标识符。不太可能引起问题，但如果他们这样做了，您也不能抱怨。

编辑

避免地板的另一种方法是分别处理正数和负数。如果您愿意接受网格单元格边缘的项目可能位于错误的单元格中（无论如何都有可能，因为浮动应该被认为是近似的）。只需在负数情况下应用 -1 偏移量，将其从零拉开几乎恰好正确的量来补偿截断。您可能会考虑事后稍微调整一下尾数（以便在您期望的单元格中获得整数值），但这可能是不必要的。

如果您可以对大小施加二次幂限制，则可能有一种稍微麻烦的方法可以有效地从浮点中提取网格位置，避免部分或全部乘法、下限和％对于每个 x、y 和 z，假设采用标准浮点表示（即，这是不可移植的）。再次强调，正面和负面要分开处理。提取指数，相应地对尾数进行位移位，然后屏蔽掉不需要的位。

I assume you use floor because negative values are possible, and because you don't want an anomaly due to the default truncation when you cast to int (values rounding toward zero from both sides, making some oversized voxels).

If you can specify a safe most-negative value for each value in the vector, you could subtract that (negative) value, or rather the nearest more-negative multiple of _gridIntervalSize, before the cast, and drop the floor.

Using fmod may ensure you have a safe most-negative value, and replace the integer %, but it's probably an anti-optimisation. Still, as a quick change, it may be worth checking.

Also, check whether your platform supports vector instructions, and whether your compiler can easily be encouraged to use them. x86 chips certainly have integer vector instructions as well as float (the old Pentium 1 MMX instructions, for a start) and might be able to handle this much more efficiently than the "normal" CPU instruction set. This may even be a case for digging out the list of vector instruction intrinsics for your compiler and doing some hand-optimisation. Just check what the compiler can do for you first - I'm not sure how much of this kind of optimisation compilers will do for you already.

One probably trivial piece of micro-optimisation...

return (mx * _gridSize + my) * _gridSize + mz;

Saves one integer multiplication. Trivial, of course, and the compiler may catch it anyway, but this is an old habitual thing.

Oh - watch the leading underscores. Those are reserved identifiers. Not likely to cause a problem, but you can't complain if they do.

EDIT

Another way to avoid the floor is to handle positive and negative separately. If you are willing to accept that items bang-on-the-edge of a grid cell may be in the wrong cell (possible anyway since floats should be considered approximate). Just apply a -1 offset in the negative case, to pull it away from the zero by almost exactly right amount to compensate for the truncation. You might consider a bit-fiddling increment-the-mantissa afterwards (to get already integer values in the cell you'd expect) but this is probably unnecessary.

If you can impose power-of-two limitations to your sizes, there may be a bit-fiddling way to efficiently extract the grid position from a float, avoiding some or all of the multiply, floor and % for each of x, y and z, assuming a standard floating point representation (ie this is non-portable). Again, handle positive and negative separately. Extract the exponent, bit-shift the mantissa accordingly, then mask out unwanted bits.

回复收藏 0 原文

霞映澄塘 2024-10-14 06:06:22

我认为您需要向上查看层次结构才能获得真正的速度提高。也就是说，将点存储在哈希映射中真的是最有效的解决方案吗？我假设你有一个 Vector3 数组的数组，即：

Vector3 *点[大小][大小][大小]

，其中 3D 数组中的每个元素都是 Vector3 的数组。

您使用的算法不能保证每个 Vector3 数组中点的均匀分布，这可能是一个问题。 _gridIntervalSize 内的点簇将映射到同一数组。

另一种方法是使用八叉树，它类似于二叉树，但每个节点有八个子节点。每个节点都需要最小/最大 x/y/z 值来定义节点覆盖的体积。向树添加值：

递归搜索树查找可以包含点的最小节点
向节点添加点
如果节点中的点数>节点中点数的上限
<块引用>
创建子节点并将点移动到子节点

如果沿特定轴的值变化很小，您可能需要使用四叉树。另一种方法是使用 BSP - 将世界分为两半并递归查找要添加点的容器。同样，这些可以是动态的。

将浮点数转换为整数并使分割平面位于整数值上也会加快该过程。

谷歌搜索上述术语将引导您更深入地分析算法。

最后，在无限平面中使用浮点数（或双精度数）作为坐标是一个坏主意 - 距离 (0,0,0) 越远，精度就越低（浮点值之间的差距随着值的增加而增加））。您将需要“重置”浮点值以保持精度。一种方法是“平铺”空间并更改坐标以使用整数和浮点部分。整数部分定义“图块”，浮点部分定义图块中的位置。此方法为您提供了一种更简单的哈希方法 - 只需使用整数部分，无需调用floor，只需进行整数计算。另一种方法是使用定点值而不是浮点值，但这会限制您的精度。这将使跨图块边界的计算变得更加容易。

如果您可以扩展坐标系的顶级要求，那么可能有更好的算法可供您使用。

I think you need to look higher up the hierarchy to get real speed improvements. That is, is storing points in a hash-map really the most efficent solution? I assume you have an array of Vector3 arrays, i.e:

Vector3 *points [size][size][size]

where each element in the 3D array is an array of Vector3.

The algorithm you're using doesn't guarantee uniform distribution of points in each Vector3 array, which may be a problem. A cluster of points within _gridIntervalSize will map to the same array.

An alternative method would be to use oct-trees, which are like binary trees but each node has eight child nodes. Each node requires the min/max x/y/z values to define the volume the node covers. To add values to the tree:

Recursive search tree to find smallest node that can contain point
Add point to node
If number of points in node > upper limit to number of points in a node
Create child nodes and move points to child nodes

You may want to use quad-trees if there is little variation in values along a particular axis. Another method is to use BSPs - divide the world into two halves and recurse to find the container to add your point to. Again, these can be dynamic.

Converting the floats to ints and having the division planes lie on integer values will speed up the process as well.

Googling the above terms will lead you to more in depth analysis of the algorithms.

Finally, using floats (or doubles) for co-ordinates in an infinite plane is a bad idea - the further you get from (0,0,0) the less precision you have (the gaps between floating point values increases as the value increases). You will need to 'reset' the floating point values to keep the precision. One method is to 'tile' the space and change the co-ordinates to use integer and floating point parts. The integer part defines the 'tile' and the floating point part defines the position in the tile. This method gets you a much simpler hashing method - just use the integer parts, no call to floor required and only integer calculations required. Another approach is to use fixed-point values rather than floating point values, but this would constrain your precision. This would make calculations accross tile boundaries much easier.

If you could expand on what the top-level requriements of your coordinate system is, there are probably better algorithms available to you.

回复收藏 0 原文

~没有更多了~