在 C++ 中记录经过时间的正确方法
我正在写一篇关于集群环境中 GPU 加速的文章
为此,我使用 CUDA 进行编程,这基本上是 C++ 扩展。 但是,由于我是 c# 开发人员,我不知道 c++ 的特殊性。
对记录经过的时间有一些担忧吗?一些建议或博客可供阅读。
我最初的想法是做一个大循环并运行程序多次。 50 ~ 100,并记录每次经过的时间,然后制作一些速度图形。
I'm doing a article about GPU speed up in cluster environment
To do that, I'm programming in CUDA, that is basically a c++ extension.
But, as I'm a c# developer I don't know the particularities of c++.
There is some concern about logging elapsed time? Some suggestion or blog to read.
My initial idea is make a big loop and run the program several times. 50 ~ 100, and log every elapsed time to after make some graphics of velocity.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据您的需求,它可以很简单:
我想您需要告诉您计划如何记录它(文件或控制台)以及您需要的精度是多少(秒,毫秒,我们等)。 “时间”以秒为单位。
Depending on your needs, it can be as easy as:
I guess you need to tell how you plan this to be logged (file or console) and what is the precision you need (seconds, ms, us, etc). "time" gives it in seconds.
我建议使用 Boost 计时器库 。它与平台无关,并且非常简单:
当然 t.elapsed() 返回一个可以保存到变量中的双精度值。
I would recommend using the boost timer library . It is platform agnostic, and is as simple as:
Of course t.elapsed() returns a double that you can save to a variable.
标准函数(例如
时间
)通常具有非常低的分辨率。是的,解决这个问题的一个好方法是多次运行测试并取平均值。请注意,由于隐藏的启动成本,前几次可能会特别慢 - 特别是在使用 GPU 等复杂资源时。对于特定于平台的调用,请查看 Windows 上的
QueryPerformanceCounter
和 OS X 上的CFAbsoluteTimeGetCurrent
。(我没有使用过 POSIX 调用clock_gettime
,但 测量 GPU 性能很棘手,因为 GPU 是运行单独指令的远程处理单元 - 通常在许多并行单元上。您可能需要访问 Nvidia 的 CUDA 专区,获取各种资源和工具来帮助测量和优化 CUDA 代码。 (与 OpenCL 相关的资源也高度相关。)
最终,您希望了解结果生成的速度有多快它到屏幕上,对吗?因此,调用
time
可能足以满足您的需求。Standard functions such as
time
often have a very low resolution. And yes, a good way to get around this is to run your test many times and take an average. Note that the first few times may be extra-slow because of hidden start-up costs - especially when using complex resources like GPUs.For platform-specific calls, take a look at
QueryPerformanceCounter
on Windows andCFAbsoluteTimeGetCurrent
on OS X. (I've not used POSIX callclock_gettime
but that might be worth checking out.)Measuring GPU performance is tricky because GPUs are remote processing units running separate instructions - often on many parallel units. You might want to visit Nvidia's CUDA Zone for a variety of resources and tools to help measure and optimize CUDA code. (Resources related to OpenCL are also highly relevant.)
Ultimately, you want to see how fast your results make it to the screen, right? For that reason, a call to
time
might well suffice for your needs.