为什么是“用户”在两台不同的AMD机器上可执行的同一可执行文件的时间不同?

发布于 2025-02-13 15:36:56 字数 1777 浏览 5 评论 0原文

已经尝试在具有相同配置的两台不同的AMD机器上执行以下源代码。

两台机器的“用户”时间都不同,

同一源代码均在两台机器上使用GCC 9编译。 他们在两台机器上都有相同的objdump。

这是源代码

#include <time.h>
int main()
{
    double time_spent = 0.0;
    clock_t begin = clock();
    int a = 0;
    for(int j = 0; j < 1000; j++)
    {
        for(int i = 0; i < 20000000; i++)
        {
            if(i%1000000 == 0)
                a = i;
        }
    }
    clock_t end = clock();
    time_spent += (double)(end - begin) / CLOCKS_PER_SEC;

        printf("The elapsed time is %f seconds\n", time_spent);
    return 0;
}

机器1:上输出(下面提到的机器类型的详细信息) 经过的时间为21.140000秒 真正的 21.14 用户21.14 系统0.00 mem:520

输出机器2:(下面提到的机器类型的详细信息) 经过的时间为24.580000秒 真正的 24.59 用户24.58 系统0.00 mem:524

机器1详细信息: 体系结构:X86_64 CPU OP模式:32位,64位 字节订单:小末日 CPU(S):128 在线CPU(S)列表:0-127 核心线程:2 每个插座的核心:32 插座:2 numa节点:2 供应商ID:authenticamd CPU家庭:23 型号:49 型号:AMD EPYC 7542 32核处理器 步进:0 CPU MHz:2894.795 Bogomips:5789.59 虚拟化:AMD-V L1D缓存:32K L1I缓存:32K L2缓存:512K L3缓存:16384k numa node0 cpu(s):0-31,64-95 numa node1 cpu(s):32-63,96-127

机器2详细信息: 体系结构:X86_64 CPU OP模式:32位,64位 字节订单:小末日 CPU(S):128 在线CPU(S)列表:0-127 核心线程:2 每个插座的核心:32 插座:2 numa节点:2 供应商ID:authenticamd CPU家庭:23 型号:49 型号:AMD EPYC 7542 32核处理器 步进:0 CPU MHz:2894.705 Bogomips:5789.41 虚拟化:AMD-V L1D缓存:32K L1I缓存:32K L2缓存:512K L3缓存:16384k numa node0 cpu(s):0-31,64-95 NUMA NODE1 CPU(S):32-63,96-127

已经尝试了以下选项: (1)使用“任务集”将应用程序固定到特定的CPU (2)使用“ nice”&amp; “ renice”更改流程的优先级 (3)尝试从 /TMP目录执行相同的源代码,以确保没有网络滞后问题。 (4)机器上没有负载。 (5)还尝试使用-O2,-O3优化。 (6)尝试使用sudo执行,但我仍然注意到两台具有相同配置的不同机器上的源代码执行的时间差。

有人可以让我知道为什么在具有相同配置的两台不同AMD机器上执行相同的源代码正在提供不同的“用户”时间

Have tried execute the below source code on two different AMD machines which has same configurations.

The "user" times are different for both the machines

Same Source code is compiled with gcc 9 on both machines.
They have the same objdump on both the machines.

Here is the source code :

#include <time.h>
int main()
{
    double time_spent = 0.0;
    clock_t begin = clock();
    int a = 0;
    for(int j = 0; j < 1000; j++)
    {
        for(int i = 0; i < 20000000; i++)
        {
            if(i%1000000 == 0)
                a = i;
        }
    }
    clock_t end = clock();
    time_spent += (double)(end - begin) / CLOCKS_PER_SEC;

        printf("The elapsed time is %f seconds\n", time_spent);
    return 0;
}

Output on Machine 1: (Details on Machine type mentioned below)
The elapsed time is 21.140000 seconds
real 21.14
user 21.14
sys 0.00
mem: 520

Output on Machine 2: (Details on Machine type mentioned below)
The elapsed time is 24.580000 seconds
real 24.59
user 24.58
sys 0.00
mem: 524

Machine 1 Details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7542 32-Core Processor
Stepping: 0
CPU MHz: 2894.795
BogoMIPS: 5789.59
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127

Machine 2 Details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7542 32-Core Processor
Stepping: 0
CPU MHz: 2894.705
BogoMIPS: 5789.41
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127

Already tried below options:
(1)used "tasket" to pin the application to a particular CPU
(2)used "nice" & "renice" to change the priority of process
(3)tried executing same source code from /tmp directory to ensure there was no network lag issues.
(4)there is no load on the machines.
(5)Have also tried using -o2, -o3 optimizations.
(6)tried executing with sudo, still i notice consistent time difference in the execution on source code on two different machines which have same configurations.

Could someone let me know as to why same source code being executed on two different AMD machines having same configurations is providing different "user" time

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文