纳秒到毫秒 - 快速除以 1000000

发布于 2024-08-01 22:27:12 字数 1588 浏览 3 评论 0原文

我想将输出从 gethrtime 转换为毫秒。

最明显的方法是除以 1000000。 然而,我经常这样做,想知道它是否会成为瓶颈。

处理 1000000 这样的数字时是否有优化的除法运算?

注意:任何代码都必须是可移植的。 我正在使用 gcc,这通常在 Sparc 硬件上

使用下面的代码进行一些快速测试...希望这是正确的。

#include <sys/time.h>
#include <iostream>

using namespace std;

const double NANOSECONDS_TO_MILLISECONDS = 1.0 / 1000000.0;

int main()
{
    hrtime_t start;
    hrtime_t tmp;
    hrtime_t fin;

    start = gethrtime();
    tmp = (hrtime_t)(start * NANOSECONDS_TO_MILLISECONDS);
    fin = gethrtime();

    cout << "Method 1"
    cout << "Original val: " << start << endl;
    cout << "Computed: " << tmp << endl;
    cout << "Time:" << fin - start << endl;

    start = gethrtime();
    tmp = (start / 1000000);
    fin = gethrtime();

    cout "Method 2"    
    cout << "Original val: " << start << endl;
    cout << "Computed: " << tmp << endl;
    cout << "Time:" << fin - start << endl;

    return 0;
}  

输出示例:

Original val: 3048161553965997
Computed: 3048161553
Time:82082
Original val: 3048161556359586
Computed: 3048161556
Time:31230

Original val: 3048239663018915
Computed: 3048239663
Time:79381
Original val: 3048239665393873
Computed: 3048239665
Time:31321

Original val: 3048249874282285
Computed: 3048249874
Time:81812
Original val: 3048249876664084
Computed: 3048249876
Time:34830

如果这是正确的,那么在这种情况下,倒数乘法实际上会更慢。 这可能是由于使用浮点数学而不是定点数学。 我将坚持使用整数除法,但这仍然几乎不需要任何时间。

I'm wanting to convert the output from gethrtime to milliseconds.

The obvious way to do this is to divide by 1000000.
However, I'm doing this quite often and wonder if it could become a bottleneck.

Is there an optimized divide operation when dealing with numbers like 1000000?

Note: Any code must be portable. I'm using gcc and this is generally on Sparc hardware

Some quick testing using the code below... hope that is right.

#include <sys/time.h>
#include <iostream>

using namespace std;

const double NANOSECONDS_TO_MILLISECONDS = 1.0 / 1000000.0;

int main()
{
    hrtime_t start;
    hrtime_t tmp;
    hrtime_t fin;

    start = gethrtime();
    tmp = (hrtime_t)(start * NANOSECONDS_TO_MILLISECONDS);
    fin = gethrtime();

    cout << "Method 1"
    cout << "Original val: " << start << endl;
    cout << "Computed: " << tmp << endl;
    cout << "Time:" << fin - start << endl;

    start = gethrtime();
    tmp = (start / 1000000);
    fin = gethrtime();

    cout "Method 2"    
    cout << "Original val: " << start << endl;
    cout << "Computed: " << tmp << endl;
    cout << "Time:" << fin - start << endl;

    return 0;
}  

Example outputs:

Original val: 3048161553965997
Computed: 3048161553
Time:82082
Original val: 3048161556359586
Computed: 3048161556
Time:31230

Original val: 3048239663018915
Computed: 3048239663
Time:79381
Original val: 3048239665393873
Computed: 3048239665
Time:31321

Original val: 3048249874282285
Computed: 3048249874
Time:81812
Original val: 3048249876664084
Computed: 3048249876
Time:34830

If this is correct, then the multiple by reciprocal is actually slower in this case. It's probably due to using floating point math instead of fixed point math. I will just stick to integer division then which still takes hardly any time at all.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

俯瞰星空 2024-08-08 22:27:12

让你的编译器来解决它!

说真的,如果您真的关心这个级别的优化(除非它出现在配置文件中,否则您不应该担心),您应该习惯于查看编译器的汇编语言输出。 您会对编译器为您所做的事情感到惊讶。

所有推荐数学技巧的人要么有糟糕的编译器,要么低估了他们的编译器。 例如,尝试编译此函数:

unsigned long div1000000(unsigned long n) {
  return n / 1000000UL;
}

Compiled with gcc 4.3.3 on x86 (-O3, -fomit-frame-pointer),我得到:

$ objdump -d div.o -M intel

test2.o:     file format elf32-i386


Disassembly of section .text:

00000000 <div1000000>:
   0:   b8 83 de 1b 43          mov    eax,0x431bde83
   5:   f7 64 24 04             mul    DWORD PTR [esp+0x4]
   9:   c1 ea 12                shr    edx,0x12
   c:   89 d0                   mov    eax,edx
   e:   c3                      ret    

换句话说,编译器采用了 n / 1000000UL 并转向将其转换为 (unsigned long long)(n * 0x431bde83) >> (0x12 + 32)。 为什么这样有效? 从我的头顶上下来,我不知道! 但编译器认为它比发出本机除法更快。

这个故事的寓意是:

  • 除非您确定它是瓶颈,否则不要优化它。
  • 不要做花哨的算术(乘以倒数、移位等),除非你已经知道你的编译器在做什么并且你认为你可以打败它。
  • 对结果进行基准测试——如果你已经证明你已经超越了你的编译器,那么只留下像花哨的位数学一样的疣。

Let your compiler figure it out!

Seriously, if you're really concerned about optimizations at this level (and you shouldn't be unless it shows up in a profile), you should get used to looking at your compiler's assembly language output. You will be amazed what the compiler is doing on your behalf.

All the people who are recommending math tricks either have bad compilers or they are underestimating their compilers. For example, try compiling this function:

unsigned long div1000000(unsigned long n) {
  return n / 1000000UL;
}

Compiled with gcc 4.3.3 on x86 (-O3, -fomit-frame-pointer), I get:

$ objdump -d div.o -M intel

test2.o:     file format elf32-i386


Disassembly of section .text:

00000000 <div1000000>:
   0:   b8 83 de 1b 43          mov    eax,0x431bde83
   5:   f7 64 24 04             mul    DWORD PTR [esp+0x4]
   9:   c1 ea 12                shr    edx,0x12
   c:   89 d0                   mov    eax,edx
   e:   c3                      ret    

In other words, the compiler took n / 1000000UL and turned it into (unsigned long long)(n * 0x431bde83) >> (0x12 + 32). Why does that work? Off the top of my head, I have no idea! But the compiler decided that it would be faster than issuing a native divide.

Moral of the story:

  • don't optimize this unless you're sure it's a bottleneck.
  • don't do fancy arithmetic (multiplying by the reciprocal, shifts, etc) unless you already know what your compiler is doing and you think you can beat it.
  • benchmark the result -- only leave in a wart like fancy bitmath if you have demonstrated that you've outdone your compiler.
空心↖ 2024-08-08 22:27:12

正如 Joshua Haberman 提到的,你的编译器可能已经将除以常数 1000000 转换为乘以“幻数”,然后进行移位(如果除法是整数运算)。 您可以在亨利·沃伦 (Henry Warren) 的《黑客之乐》一书中以及配套网站中了解更多详细信息:

他甚至有一个页面,其中有一个用于神奇数字的 Javascript 计算器:

As Joshua Haberman mentioned, your compiler will probably already convert the division by a constant 1000000 to a multiply by a 'magic number' followed by a shift (if the division is an integer operation). You can get more details of what's going on in Henry Warren's "Hacker's Delight" book and at the companion website:

He even has a page that has a Javascript calculator for the magic numbers:

一直在等你来 2024-08-08 22:27:12

除法不是一个昂贵的操作。 我非常怀疑除以 1000000 的操作是否会接近应用程序的主要瓶颈。 浮点处理器将比您能想到的任何“技巧”快得多,而不仅仅是执行单个操作。

Division is not an expensive operation. I doubt very much if a divide-by-1000000 operation will be anywhere near the main bottleneck in your application. Floating-point processors will be way faster than any sort of "tricks" you can come up with than just doing the single operation.

念﹏祤嫣 2024-08-08 22:27:12

我很惊讶还没有人明白这一点……

  • 除法与乘以一个分数乘以 2 的小数次方的速度相同
  • :只是位移位
  • 积分除法涉及向下舍入,
  • 向下舍入就像乘以一个稍小的分数(向上到了一定程度,你需要意识到你的范围)

所以,

const uint64_t numerator = (1LL<<32)/1000000;

...

millionths = ( number * numerator ) >> 32;

Supah 快!

I'm surprised nobody has gotten this yet…

  • division is the same as multiplication by a fraction
  • multiplying by a fractional power of 2 is fast: just bit-shift
  • integral division involves rounding down
  • rounding down is like multiplying by a slightly smaller fraction (up to a certain point, you need to be aware of your ranges)

So,

const uint64_t numerator = (1LL<<32)/1000000;

...

millionths = ( number * numerator ) >> 32;

Supah fast!

っ左 2024-08-08 22:27:12

乘以 1/1,000,000。 应该会更快。 我的谷歌搜索显示,要加速除法,乘以倒数。 因此,如果存在一组相对已知的可能值,我会预先计算倒数或倒数列表,然后相乘。

雅各布

Multiply by 1/1,000,000. It should be faster. My Google search was saying to speed up divisions, multiply be the reciprocal. So I would pre-calculate the reciprocal or a list of reciprocals if there is a relatively known set of possible values, and then multiply.

Jacob

堇年纸鸢 2024-08-08 22:27:12

但是,我经常这样做,并且想知道它是否会成为瓶颈。

首先要事。 如果您认为这将成为瓶颈,请分析有问题的代码并找出答案。

如果(且仅当)这是您的瓶颈,那么就努力改进它。

现在,谈谈您的改进选项:

1.您可能不需要立即转换为毫秒。 如果您只是收集数据,只需存储从 gethrtime() 返回的完整 64 位数字即可完成。 人类需要阅读的任何内容都可以在以后进行后处理,或者以不太激进的更新频率进行处理。

2. 如果您正在对某些重复事件进行计时,您可以尝试对两次调用之间的差异进行除法,如果满足以下条件,该差异应该非常您调用 gethrtime() 的频率足以产生瓶颈:

static hrtime_t oldtime;
hrtime_t newtime = gethrtime();
int milliseconds = fastDivByOneMillion((UI32)(newtime - oldtime));
oldtime = newtime;

3. 您可以将 fastDivByOneMillion() 实现为乘法和除法2 的幂:

int fastDivByOneMillion(UI32 nanoseconds)
{
    return (int)((UI64)nanoseconds * 4295 >> 32);
}

注意:

  • 您的编译器可以找出执行 >> 的最佳方法。 32 在您的硬件上。 大多数时候,这只是一两个时钟周期。
  • 我使用 UI32UI64 来表示 32 和 64 位无符号数。
  • 所有这些都需要更多的分析,以确保它确实产生了可衡量的改进。

  • However, I'm doing this quite often and wonder if it could become a bottleneck.

    First things first. If you think this will be a bottleneck, profile the code in question and find out for sure.

    If, (and only if) this is your bottleneck, then work on improving it.

    Now, on to your improvement options:

    1. You may not need to convert to milliseconds right away. If you are simply collecting data, just store the full 64-bit number returned from gethrtime() and be done with it. Anything that a human needs to read can be post-processed at a later time, or on a much less agressive update frequency.

    2. If you are timing some repetitive event, you can try performing the division on the difference between two calls, which should be very small if you are calling gethrtime() often enough to have a bottleneck:

    static hrtime_t oldtime;
    hrtime_t newtime = gethrtime();
    int milliseconds = fastDivByOneMillion((UI32)(newtime - oldtime));
    oldtime = newtime;
    

    3. You can implement fastDivByOneMillion() as a multiplication and a division by a power of 2:

    int fastDivByOneMillion(UI32 nanoseconds)
    {
        return (int)((UI64)nanoseconds * 4295 >> 32);
    }
    

    Notes:

  • Your compiler can figure out the best way to do >> 32 on your hardware. Most of the time, this will be only one or two clock cylces.
  • I used UI32 and UI64 to represent 32 and 64-bit unsigned numbers.
  • All of this will require more profiling to be sure that it is actually producing a measurable improvement.

  • 生生漫 2024-08-08 22:27:12

    如果你能解决这个问题,这就是我的解决方案。

    • 使用整数而不是浮点数(它们更快
    • 通过向右移动位来除以 1048576(这比浮点数上的任何内容都便宜

    并说服自己毫秒应该是 base2而不是base10。 ;-)

    If you can get around with that, here's my solution.

    • use integers instead of floats (they are faster)
    • divide by 1048576 by shifting bits to the right (that is cheaper than anything on floats)

    and convince yourself that milliseconds should be base2 and not base10. ;-)

    空袭的梦i 2024-08-08 22:27:12

    1/1000000 是 0.000000000000000000 0100 0011 0001 1011 1101 1110 1000 0010 1101 0111 1011 0110 0011 01 二进制 - 即 0x431BDE82 * 2^-18

    因此 n /1000000 相当于 (n*0x431BDE82)>>18

    也是 n/1000000相当于 (n*0x8637BD04)>>19

    请注意,这是一个“定点”计算,您应该意识到精度可能会丢失。

    1/1000000 is 0.000000000000000000 0100 0011 0001 1011 1101 1110 1000 0010 1101 0111 1011 0110 0011 01 binary - which is 0x431BDE82 * 2^-18

    Therefore n/1000000 is equivalent to (n*0x431BDE82)>>18

    Also n/1000000 is equivalent to (n*0x8637BD04)>>19

    Note that this is a "fixed point" calculation and you should be aware that precision could be lost.

    野の 2024-08-08 22:27:12

    可以将整数除法转换为一系列更简单的运算。 由 Terje Mathisen 推广的通用方法在《
    优化汇编语言子例程。 如果您事先知道数据类型的宽度以及要除以的内容,那么这将引导您了解如何将其转换为一系列更简单的运算,理论上可能比必须处理的更通用的除法运算更快任何除数。 如果您担心其中一些整数大小不同,仍然需要担心一些平台问题。

    除非您实际上用汇编语言进行编程,否则我打赌您不会在 SPARC 除法实现的过程中实际改进任何内容。 也许如果您使用的是非常古老的 SPARC V7 处理器,那么从之前的划分开始是 在硬件中实现,您可能会得到一些改进,但即便如此,我打赌内置除法会更快。

    无论如何,我怀疑您在这里已经开始了一些过早的优化。 在假设该部门对其运行时有任何重大影响之前,您应该从这里开始分析您所拥有的应用程序,并且您应该类似地分析对该部门的任何更改以证明它按预期工作。 考虑到 CPU 缓存之类的东西已经变得多么复杂,很容易获得您认为执行速度更快但实际上执行速度更快的代码。

    It's possible to transform integer division into a series of simpler operations. A generic method to do that popularized by Terje Mathisen is outlined on page 136 of
    Optimizing subroutines in assembly language. If you know in advance the width of your data types and what you're dividing by, that will lead you through how to turn that into a serious of simpler operations that in theory might be faster than a more generic division operation that has to handle any divisor. There will still be some platform issues to be concerned about if you're worried about differently sized integers on some of them.

    Unless you're actually programming this in assembly language, I'd bet against you actually improving anything in the process over the SPARC divide implementation. Maybe if you're using a positively ancient SPARC V7 processor, from before division was implemented in hardware, you might get some improvement, but even then I'd bet on the built-in division being faster.

    Regardless, I suspect you've launched into a bit of premature optimization here though. You should start here by profiling the app you've got before presuming this division has any significant impact on its runtime, and you should similarly profile any change to the division to prove it works as expected. It's quite easy to get code that you think will execute faster but actually doesn't nowadays, given how complicated things like CPU caches have gotten.

    π浅易 2024-08-08 22:27:12

    首先,明显的免责声明:除非您至少每秒执行几百万次除法,否则它不会成为瓶颈,您应该离开它。 过早的优化等等。

    其次,您需要结果有多准确? 二进制和十进制之间转换的一个方便的经验法则是 2^10 ~= 10^3。

    换句话说,一百万大约等于2^20。 所以你可以右移 20。当然,编译器不会自动为你做这件事,因为它改变了结果。 但如果您愿意接受轻微的准确性,并且除法实际上是一个真正的性能问题,这将是我的建议。

    First, the obvious disclaimer: Unless you perform the division a couple of million times per second at least, it won't be a bottleneck, and you should just leave it. Premature optimization and all that.

    Second, how accurate do you need the result to be? A handy rule of thumb for converting between binary and decimal is that 2^10 ~= 10^3.

    In other words, a million is roughly equal to 2^20. So you could just right shift 20. The compiler won't do this for you automatically, of course, because it changes the result. But if you're willing to live with the slight accuracy, and the division is actually a real performance problem, this would be my suggestion.

    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文