纳秒到毫秒 - 快速除以 1000000
我想将输出从 gethrtime 转换为毫秒。
最明显的方法是除以 1000000。 然而,我经常这样做,想知道它是否会成为瓶颈。
处理 1000000 这样的数字时是否有优化的除法运算?
注意:任何代码都必须是可移植的。 我正在使用 gcc,这通常在 Sparc 硬件上
使用下面的代码进行一些快速测试...希望这是正确的。
#include <sys/time.h>
#include <iostream>
using namespace std;
const double NANOSECONDS_TO_MILLISECONDS = 1.0 / 1000000.0;
int main()
{
hrtime_t start;
hrtime_t tmp;
hrtime_t fin;
start = gethrtime();
tmp = (hrtime_t)(start * NANOSECONDS_TO_MILLISECONDS);
fin = gethrtime();
cout << "Method 1"
cout << "Original val: " << start << endl;
cout << "Computed: " << tmp << endl;
cout << "Time:" << fin - start << endl;
start = gethrtime();
tmp = (start / 1000000);
fin = gethrtime();
cout "Method 2"
cout << "Original val: " << start << endl;
cout << "Computed: " << tmp << endl;
cout << "Time:" << fin - start << endl;
return 0;
}
输出示例:
Original val: 3048161553965997
Computed: 3048161553
Time:82082
Original val: 3048161556359586
Computed: 3048161556
Time:31230
Original val: 3048239663018915
Computed: 3048239663
Time:79381
Original val: 3048239665393873
Computed: 3048239665
Time:31321
Original val: 3048249874282285
Computed: 3048249874
Time:81812
Original val: 3048249876664084
Computed: 3048249876
Time:34830
如果这是正确的,那么在这种情况下,倒数乘法实际上会更慢。 这可能是由于使用浮点数学而不是定点数学。 我将坚持使用整数除法,但这仍然几乎不需要任何时间。
I'm wanting to convert the output from gethrtime to milliseconds.
The obvious way to do this is to divide by 1000000.
However, I'm doing this quite often and wonder if it could become a bottleneck.
Is there an optimized divide operation when dealing with numbers like 1000000?
Note: Any code must be portable. I'm using gcc and this is generally on Sparc hardware
Some quick testing using the code below... hope that is right.
#include <sys/time.h>
#include <iostream>
using namespace std;
const double NANOSECONDS_TO_MILLISECONDS = 1.0 / 1000000.0;
int main()
{
hrtime_t start;
hrtime_t tmp;
hrtime_t fin;
start = gethrtime();
tmp = (hrtime_t)(start * NANOSECONDS_TO_MILLISECONDS);
fin = gethrtime();
cout << "Method 1"
cout << "Original val: " << start << endl;
cout << "Computed: " << tmp << endl;
cout << "Time:" << fin - start << endl;
start = gethrtime();
tmp = (start / 1000000);
fin = gethrtime();
cout "Method 2"
cout << "Original val: " << start << endl;
cout << "Computed: " << tmp << endl;
cout << "Time:" << fin - start << endl;
return 0;
}
Example outputs:
Original val: 3048161553965997
Computed: 3048161553
Time:82082
Original val: 3048161556359586
Computed: 3048161556
Time:31230
Original val: 3048239663018915
Computed: 3048239663
Time:79381
Original val: 3048239665393873
Computed: 3048239665
Time:31321
Original val: 3048249874282285
Computed: 3048249874
Time:81812
Original val: 3048249876664084
Computed: 3048249876
Time:34830
If this is correct, then the multiple by reciprocal is actually slower in this case. It's probably due to using floating point math instead of fixed point math. I will just stick to integer division then which still takes hardly any time at all.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
让你的编译器来解决它!
说真的,如果您真的关心这个级别的优化(除非它出现在配置文件中,否则您不应该担心),您应该习惯于查看编译器的汇编语言输出。 您会对编译器为您所做的事情感到惊讶。
所有推荐数学技巧的人要么有糟糕的编译器,要么低估了他们的编译器。 例如,尝试编译此函数:
Compiled with gcc 4.3.3 on x86 (-O3, -fomit-frame-pointer),我得到:
换句话说,编译器采用了
n / 1000000UL
并转向将其转换为(unsigned long long)(n * 0x431bde83) >> (0x12 + 32)。 为什么这样有效? 从我的头顶上下来,我不知道! 但编译器认为它比发出本机除法更快。
这个故事的寓意是:
Let your compiler figure it out!
Seriously, if you're really concerned about optimizations at this level (and you shouldn't be unless it shows up in a profile), you should get used to looking at your compiler's assembly language output. You will be amazed what the compiler is doing on your behalf.
All the people who are recommending math tricks either have bad compilers or they are underestimating their compilers. For example, try compiling this function:
Compiled with gcc 4.3.3 on x86 (-O3, -fomit-frame-pointer), I get:
In other words, the compiler took
n / 1000000UL
and turned it into(unsigned long long)(n * 0x431bde83) >> (0x12 + 32)
. Why does that work? Off the top of my head, I have no idea! But the compiler decided that it would be faster than issuing a native divide.Moral of the story:
正如 Joshua Haberman 提到的,你的编译器可能已经将除以常数 1000000 转换为乘以“幻数”,然后进行移位(如果除法是整数运算)。 您可以在亨利·沃伦 (Henry Warren) 的《黑客之乐》一书中以及配套网站中了解更多详细信息:
他甚至有一个页面,其中有一个用于神奇数字的 Javascript 计算器:
As Joshua Haberman mentioned, your compiler will probably already convert the division by a constant 1000000 to a multiply by a 'magic number' followed by a shift (if the division is an integer operation). You can get more details of what's going on in Henry Warren's "Hacker's Delight" book and at the companion website:
He even has a page that has a Javascript calculator for the magic numbers:
除法不是一个昂贵的操作。 我非常怀疑除以 1000000 的操作是否会接近应用程序的主要瓶颈。 浮点处理器将比您能想到的任何“技巧”快得多,而不仅仅是执行单个操作。
Division is not an expensive operation. I doubt very much if a divide-by-1000000 operation will be anywhere near the main bottleneck in your application. Floating-point processors will be way faster than any sort of "tricks" you can come up with than just doing the single operation.
我很惊讶还没有人明白这一点……
所以,
...
Supah 快!
I'm surprised nobody has gotten this yet…
So,
...
Supah fast!
乘以 1/1,000,000。 应该会更快。 我的谷歌搜索显示,要加速除法,乘以倒数。 因此,如果存在一组相对已知的可能值,我会预先计算倒数或倒数列表,然后相乘。
雅各布
Multiply by 1/1,000,000. It should be faster. My Google search was saying to speed up divisions, multiply be the reciprocal. So I would pre-calculate the reciprocal or a list of reciprocals if there is a relatively known set of possible values, and then multiply.
Jacob
首先要事。 如果您认为这将成为瓶颈,请分析有问题的代码并找出答案。
如果(且仅当)这是您的瓶颈,那么就努力改进它。
现在,谈谈您的改进选项:
1.
您可能不需要立即转换为毫秒。 如果您只是收集数据,只需存储从gethrtime()
返回的完整 64 位数字即可完成。 人类需要阅读的任何内容都可以在以后进行后处理,或者以不太激进的更新频率进行处理。2.
如果您正在对某些重复事件进行计时,您可以尝试对两次调用之间的差异进行除法,如果满足以下条件,该差异应该非常您调用gethrtime()
的频率足以产生瓶颈:3.
您可以将fastDivByOneMillion()
实现为乘法和除法2 的幂:注意:
>> 的最佳方法。 32 在您的硬件上。 大多数时候,这只是一两个时钟周期。
UI32
和UI64
来表示 32 和 64 位无符号数。First things first. If you think this will be a bottleneck, profile the code in question and find out for sure.
If, (and only if) this is your bottleneck, then work on improving it.
Now, on to your improvement options:
1.
You may not need to convert to milliseconds right away. If you are simply collecting data, just store the full 64-bit number returned fromgethrtime()
and be done with it. Anything that a human needs to read can be post-processed at a later time, or on a much less agressive update frequency.2.
If you are timing some repetitive event, you can try performing the division on the difference between two calls, which should be very small if you are callinggethrtime()
often enough to have a bottleneck:3.
You can implementfastDivByOneMillion()
as a multiplication and a division by a power of 2:Notes:
>> 32
on your hardware. Most of the time, this will be only one or two clock cylces.UI32
andUI64
to represent 32 and 64-bit unsigned numbers.如果你能解决这个问题,这就是我的解决方案。
并说服自己毫秒应该是 base2而不是base10。 ;-)
If you can get around with that, here's my solution.
and convince yourself that milliseconds should be base2 and not base10. ;-)
1/1000000 是 0.000000000000000000 0100 0011 0001 1011 1101 1110 1000 0010 1101 0111 1011 0110 0011 01 二进制 - 即 0x431BDE82 * 2^-18
因此 n /1000000 相当于 (n*0x431BDE82)>>18
也是 n/1000000相当于 (n*0x8637BD04)>>19
请注意,这是一个“定点”计算,您应该意识到精度可能会丢失。
1/1000000 is 0.000000000000000000 0100 0011 0001 1011 1101 1110 1000 0010 1101 0111 1011 0110 0011 01 binary - which is 0x431BDE82 * 2^-18
Therefore n/1000000 is equivalent to (n*0x431BDE82)>>18
Also n/1000000 is equivalent to (n*0x8637BD04)>>19
Note that this is a "fixed point" calculation and you should be aware that precision could be lost.
可以将整数除法转换为一系列更简单的运算。 由 Terje Mathisen 推广的通用方法在《
优化汇编语言子例程。 如果您事先知道数据类型的宽度以及要除以的内容,那么这将引导您了解如何将其转换为一系列更简单的运算,理论上可能比必须处理的更通用的除法运算更快任何除数。 如果您担心其中一些整数大小不同,仍然需要担心一些平台问题。
除非您实际上用汇编语言进行编程,否则我打赌您不会在 SPARC 除法实现的过程中实际改进任何内容。 也许如果您使用的是非常古老的 SPARC V7 处理器,那么从之前的划分开始是 在硬件中实现,您可能会得到一些改进,但即便如此,我打赌内置除法会更快。
无论如何,我怀疑您在这里已经开始了一些过早的优化。 在假设该部门对其运行时有任何重大影响之前,您应该从这里开始分析您所拥有的应用程序,并且您应该类似地分析对该部门的任何更改以证明它按预期工作。 考虑到 CPU 缓存之类的东西已经变得多么复杂,很容易获得您认为执行速度更快但实际上执行速度更快的代码。
It's possible to transform integer division into a series of simpler operations. A generic method to do that popularized by Terje Mathisen is outlined on page 136 of
Optimizing subroutines in assembly language. If you know in advance the width of your data types and what you're dividing by, that will lead you through how to turn that into a serious of simpler operations that in theory might be faster than a more generic division operation that has to handle any divisor. There will still be some platform issues to be concerned about if you're worried about differently sized integers on some of them.
Unless you're actually programming this in assembly language, I'd bet against you actually improving anything in the process over the SPARC divide implementation. Maybe if you're using a positively ancient SPARC V7 processor, from before division was implemented in hardware, you might get some improvement, but even then I'd bet on the built-in division being faster.
Regardless, I suspect you've launched into a bit of premature optimization here though. You should start here by profiling the app you've got before presuming this division has any significant impact on its runtime, and you should similarly profile any change to the division to prove it works as expected. It's quite easy to get code that you think will execute faster but actually doesn't nowadays, given how complicated things like CPU caches have gotten.
首先,明显的免责声明:除非您至少每秒执行几百万次除法,否则它不会成为瓶颈,您应该离开它。 过早的优化等等。
其次,您需要结果有多准确? 二进制和十进制之间转换的一个方便的经验法则是 2^10 ~= 10^3。
换句话说,一百万大约等于2^20。 所以你可以右移 20。当然,编译器不会自动为你做这件事,因为它改变了结果。 但如果您愿意接受轻微的准确性,并且除法实际上是一个真正的性能问题,这将是我的建议。
First, the obvious disclaimer: Unless you perform the division a couple of million times per second at least, it won't be a bottleneck, and you should just leave it. Premature optimization and all that.
Second, how accurate do you need the result to be? A handy rule of thumb for converting between binary and decimal is that 2^10 ~= 10^3.
In other words, a million is roughly equal to 2^20. So you could just right shift 20. The compiler won't do this for you automatically, of course, because it changes the result. But if you're willing to live with the slight accuracy, and the division is actually a real performance problem, this would be my suggestion.