在不同多核处理器上进行测试的速度标准化
我想计算一些简单的C程序在不同的多核处理器上的运行时间。但正如我们所知,随着技术的进步,新的处理器正在结合更多的方法来实现更快的计算,例如时钟速度等。我如何才能标准化这种速度变化(以过滤除多核之外的处理器中其他先进方法的影响),因为我只想根据处理器核心数获得结果。
I want to calculate run time of some simple c programs on different multi-core processors. But as we know with advancement of technology new processors are incorporating more methods for faster computation like clock speed etc. How can I normalize such speed changes(to filter out the effect of other advance methods in processor except multi-core) as I only want to get results on the basis of number of cores of processor.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在 Linux 下,您可以使用内核命令行参数
maxcpus=N
启动,以限制机器只能使用 N 个 CPU。有关详细信息,请参阅内核源代码中的Documentation/kernel-parameters.txt
。大多数 BIOS 环境还可以关闭超线程;根据您的基准测试,HT 可能会加速或减慢您的测试;控制 HT 将是理想的选择。
Under Linux, you can boot with the kernel command line parameter
maxcpus=N
to limit the machine to only N CPUs. SeeDocumentation/kernel-parameters.txt
in the kernel source for details.Most BIOS environments also have the ability to turn off hyperthreading; depending upon your benchmarks, HT may speed up or slow down your tests; being in control of HT would be ideal.
确定一组已知的参考硬件,针对其运行某种可重复的参考基准,并获得一个良好的已知值进行比较。然后,您可以针对其他系统运行此基准测试,以找出如何扩展从目标基准测试运行中获得的值。
您的参考基准越接近您的实际应用,缩放结果就越准确。您可以将应用程序的单个确定性运行(单个代码路径,可能是多次执行的平均值)用作参考基准。
Decide on a known set of reference hardware, run some sort of repeatable reference benchmark against this, and get a good known value to compare to. Then you can run this benchmark against other systems to figure out how to scale the values you get from your target benchmark runs.
The closer your reference benchmark is to your actual application, the more accurate the results of your scaling will be. You could have a single deterministic run (single code path, maybe average of multiple executions) of your application used as your reference benchmark.
如果我理解正确的话,您正在尝试找到一种测量方法,该方法可以将扩展核心数量的影响与单处理器改进的进步分开。恐怕这不容易实现。例如,如果将多核系统与该系统的单核进行比较,就会发现非线性相关性。因为存在共享资源,例如内存总线。如果只使用多核系统中的一个核心,它可以使用完整的内存带宽,但在多核情况下必须共享。类似的论点适用于许多共享资源:如缓存、总线、IO 功能、ALU 等。
If I understand you correctly, you are trying to find a measurement approach that allows to separate the effect of scaling the number of cores from advances of single processor improvements. I am afraid that is not easily possible. E.g. if you compare a multi-core system to one single core of that system you have a non-linear correlation. Because there are shared resources as e.g. the memory bus. If you use only one core of multi-core system it can use the complete memory bandwidth while it has to share in the multi-core case. Similar arguments apply to many shared resources: as there are caches, buses, io capabillities, ALUs, etc.
您的问题是根据任何给定时间的活动核心数量自动缩放核心频率。例如,AMD Phenom 6 核芯片的运行频率为 3.4GHz(或类似),如果您的应用程序创建超过 3 个线程,则其运行频率会降至 2.8Ghz(或类似)。另一方面,英特尔使用一系列启发式方法来确定任何给定时间的正确频率。
但是,您始终可以通过进入 BIOS 来关闭这些设置,然后结果将具有可比性,仅根据时钟频率而有所不同。通常,人们测量千兆次失败以获得可比较的结果。
Your issue is with the auto scaling of core frequency based on the amount of active cores at any given time. For instance, AMD phenom 6-core chips operate at 3.4GHz (or somewhat similar) and if your application creates more than 3 threads it goes down to 2.8Ghz (or similar). Intel on the other hand uses a bunch of heuristics to determine the right frequency for any given time.
However, you can always turn these settings off by going to BIOS and then the results will be comparable only differing based on clock frequency. Usually, people measure giga flops to have comparable results.