测量计算几何算法的运行时间
我将在秋季学习计算几何课程,我们将在其中使用 C 或 C++ 实现一些算法并对它们进行基准测试。大多数学生生成一些数据集并使用 time
命令测量他们的程序,但我想更彻底一些。
我正在考虑编写一个程序来自动生成不同的数据集,用它们运行我的程序并使用 R 来测试假设和估计参数。
那么...如何更准确地衡量程序运行时间呢?
什么可能与衡量相关?
哪些假设可能值得测试(方差、缓存引起的影响等)?
我应该在多台机器上测试我的代码吗?这些机器应该有何不同?
我的总体目标是了解这些算法在实践中如何执行,哪些实现技术更好以及硬件的实际执行情况。
I am taking a course on computational geometry in the fall, where we will be implementing some algorithms in C or C++ and benchmarking them. Most of the students generate a few datasets and measure their programs with the time
command, but I would like to be a bit more thorough.
I am thinking about writing a program to automatically generate different datasets, run my program with them and use R to test hypotheses and estimate parameters.
So... How do you measure program running time more accurately?
What might be relevant to measure?
What hypotheses might be interesting to test (variance, effects caused by caching, etc.)?
Should I test my code on more than one machine? How should these machines differ?
My overall goals are to learn how these algorithms perform in practice, which implementation techniques are better and how the hardware actually performs.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
分析器很棒。 Valgrind 非常受欢迎。另外,如果您可以访问某些机器,我建议您在 risc 机器上尝试您的代码。它们的性能特征与 cisc 机器有一些有趣的不同。
Profilers are great. Valgrind is pretty popular. Also, I'd suggest trying your code out on risc machines if you can get access to some. Their performance characteristics are different from those of cisc machines in interesting ways.
您可以使用 Windows API 计时函数(不那么准确),并且可以使用精确到亚纳秒的 RDTSC 内联汇编器命令(不要忘记该命令及其周围的指令会产生数百个周期的小开销但这不是一个大问题)。
You could use the Windows API timing function (are not that exactly) and you can use the RDTSC inline assembler command which is sub-nanosecond exact(don't forget that the command and the instructions around it create a small overhead of some hundreds cycles but this is not an big issue).
为了提高程序指标指标的准确性,您必须多次运行程序,例如 100 或 1000 次。
有关指标的更多详细信息,请在网络上搜索指标 em> 和分析。
请注意,由于后台运行的程序(例如病毒扫描程序、音乐播放器和其他带有计时器的程序),程序的性能(时间)测量可能会有所不同。
您可以在不同的机器上测试您的程序。处理器时钟速率、L1 和 L2 缓存大小、RAM 大小和磁盘速度都是因素(以及同时运行的其他程序/任务的数量)。浮点也可能是一个因素。
如果需要,您可以通过打印各种优化设置列表的汇编语言来挑战您的编译器。查看哪种设置生成最少或最有效的汇编代码。
处理数据后,请查看数据驱动设计:http:// www.gamearchitect.net/Articles/DataDrivenDesign.html
In order to get better accuracy with program metrics, you will have to run your program many times, such as 100 or 1000.
For more details, on metrics, search the web for metrics and profiling.
Beware that programs may differ in performance (time) measurements due to things running in the background such as virus scanners, music players, and other programs with timers in them.
You could test your program on different machines. Processor clock rates, L1 and L2 cache sizes, RAM sizes, and Disk speeds are all factors (as well as the number of other programs / tasks running concurrently). Floating point may also be a factor.
If you want, you can challenge your compiler by printing the assembly language of the listings for various optimization settings. See which setting produces the fewest or most efficient assembly code.
Since your processing data, look at data driven design: http://www.gamearchitect.net/Articles/DataDrivenDesign.html
您可以使用 Windows 高性能计数器来获得纳秒精度。从技术上讲,据我所知,HPC 可以是任何速度,但你可以查询它每秒的计数,据我所知,大多数 CPU 都可以进行非常非常高的性能计数。
你应该做的就是找一个专业的分析员。这就是他们的目的。然而更现实的是。
如果您只是在算法之间进行比较,只要您的机器不恰好在某一领域表现出色(Pentium D、SSD 之类的),那么仅在一台机器上执行此操作应该不会太重要。如果您想查看缓存效果,请尝试在计算机启动后立即运行该算法(确保您获得 Windows 7 的副本,对于 CS 学生来说应该是免费的),然后让它做一些可能会占用大量缓存的操作,就像图像处理一样,持续 24 小时或说服操作系统对其进行缓存。然后再次运行算法。比较。
You can use the Windows High Performance Counter to get nanosecond accuracy. Technically, afaik, the HPC can be any speed, but you can query it's counts per second, and as far as I know, most CPUs do very very high performance counting.
What you should do is just get a professional profiler. That's what they're for. More realistically, however.
If you're only comparing between algorithms, as long as your machine doesn't happen to excel in one area (Pentium D, SSD sort of thing) it shouldn't matter too much to do it on just one machine. If you want to look at cache effects, try running the algorithm right after the machine starts up (make sure that you get a copy of Windows 7, should be free for CS students), then leave it doing something that can be plenty cache heavy, like image processing, for 24h or something to convince the OS to cache it. Then run algorithm again. Compare.
您没有指定您的平台。如果您使用的是 POSIX 系统(例如 Linux),请查看
clock_gettime
。这使您可以访问不同类型的时钟,例如挂钟时间或CPU时间。您还可以了解时钟的精度。由于您愿意对数字进行良好的统计,因此您应该经常重复您的实验,以便统计测试给您足够的信心。
如果您的测量粒度不太细并且方差很低,那么对于 10 个左右的探针来说通常就相当不错了。但如果你的规模缩小到一个短函数或类似的程度,你可能需要走得更高。
此外,您还必须确保可重复的实验条件、机器上没有其他负载、足够的可用内存等。
You didn't specify your platform. If you are on a POSIX system (eg linux) have a look into
clock_gettime
. This lets you access different kinds of clocks e.g wall clock time or cpu time. You also may get to know about the precision of the clocks.Since you are willing to do good statistics on your numbers, you should repeat your experiments often enough such that the statistical test give you enough confidence.
If your measurements are not too fine grained and your variance is low this often is quite good for 10 probes or so. But if you go down to small scale, a short function or so, you might need to go much higher.
Also you would have to ensure reproducible experimental conditions, no other load on the machine, enough memory available etc.