时间戳计数器
我通过查询寄存器在我的 C++ 程序中使用时间戳计数器。然而,我遇到的一个问题是获取时间戳的函数会从不同的CPU获取。我如何确保我的函数始终从同一个 CPU 获取时间戳,或者是否有办法同步 CPU?顺便说一下,我的程序运行在 Fedora 13 64 位的 4 核服务器上。
谢谢。
I am using time stamp counter in my C++ programme by querying the register. However, one problem I encounter is that the function to acquire the time stamp would acquire from different CPU. How could I ensure that my function would always acquire the timestamp from the same CPU or is there anyway to synchronize the CPU? By the way, my programme is running on 4 cores server in Fedora 13 64 bit.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
请看以下英特尔手册的摘录。根据第 16.12 节,我认为下面的“较新的处理器”是指比 Pentium 4 更新的任何处理器。如果支持,您可以使用 rdtscp 指令同时自动确定 tsc 值和核心 ID。不过我还没试过。祝你好运。
Intel 64 和 IA-32 架构软件开发人员手册
第 3 卷(3A 和 3B):系统编程指南:
第 16.12.1 章不变 TSC
较新处理器中的时间戳计数器可能支持增强功能,请参阅
作为不变的 TSC。处理器对不变 TSC 的支持由
CPUID.80000007H:EDX[8]。
不变的 TSC 将在所有 ACPI P-、C- 中以恒定速率运行。和 T 状态。这是
建筑行为向前发展。在具有不变 TSC 的处理器上
支持,操作系统可以使用 TSC 来提供挂钟定时器服务(而不是 ACPI 或
HPET 定时器)。 TSC 读取效率更高并且不会产生开销
与环转换或对平台资源的访问相关联。
英特尔还有一份代码执行基准测试指南,讨论了 cpu 与 rdtsc 的关联 - http:// /download.intel.com/embedded/software/IA/324264.pdf
Look at the following excerpt from Intel manual. According to section 16.12, I think the "newer processors" below refers to any processor newer than pentium 4. You can simultaneously and atomically determine the tsc value and the core ID using the rdtscp instruction if it is supported. I haven't tried it though. Good Luck.
Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3 (3A & 3B): System Programming Guide:
Chapter 16.12.1 Invariant TSC
The time stamp counter in newer processors may support an enhancement, referred
to as invariant TSC. Processor’s support for invariant TSC is indicated by
CPUID.80000007H:EDX[8].
The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is
the architectural behavior moving forward. On processors with invariant TSC
support, the OS may use the TSC for wall clock timer services (instead of ACPI or
HPET timers). TSC reads are much more efficient and do not incur the overhead
associated with a ring transition or access to a platform resource.
Intel also has a guide on code execution benchmarking that discusses cpu association with rdtsc - http://download.intel.com/embedded/software/IA/324264.pdf
根据我的经验,明智的做法是完全避免 TSC,除非您确实想测量各个内核/CPU 上的各个时钟周期。
TSC 的潜在问题:
这基本上可以归结为,如果您强制线程的关联性,则只能使用 TSC 来测量单线程应用程序中单个 CPU 上经过的 CPU 周期(而不是经过的时间)。
首选的替代方案是使用系统函数。最可移植的(在 Unix/Mac 上)是 gettimeofday(),它通常非常准确。更合适的函数可能是 clock_gettime(),但是首先检查您的系统是否支持它。在 Windows 下,您可以安全地使用 QueryPerformanceCounter() 。
In my experience, it is wise to avoid TSC altogether, unless you really want to measure individual clock cycles on individual cores/CPUs.
Potential problems with TSC:
This basically boils down to that you can only use the TSC to measure elapsed CPU cycles (not elapsed time) on a single CPU in a single threaded application, if you force the affinity for the thread.
The preferred alternative is to use system functions. The most portable (on Unix/Mac) is gettimeofday(), which is usually very accurate. A more appropriate function might be clock_gettime(), but check if it is supported on your system first. Under Windows you can safely use QueryPerformanceCounter().
您可以使用 sched_setaffinity 或 cpuset 功能来创建 cpuset 并将任务分配给该集合。
You can use
sched_setaffinity
or cpuset feature that lets you create a cpuset and assign tasks to the set.