Linux 应用程序分析

发布于 2024-08-20 23:23:36 字数 142 浏览 4 评论 0 原文

如何记录 Linux 机器上应用程序的性能?我不会有 IDE。

理想情况下,我需要一个应用程序,该应用程序将附加到进程并记录以下定期快照:

  • 内存使用情况
  • 线程数量
  • CPU 使用情况

How can I record the performance of an application on a Linux machine? I won't have an IDE.

Ideally, I need an application that will attach to a process and log periodic snapshots of:

  • memory usage
  • number of threads
  • CPU usage

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

別甾虛僞 2024-08-27 23:23:36

理想情况下,我需要一个应用程序来附加到进程并记录以下内容的定期快照:

  • 内存使用情况
  • 线程数
  • CPU 使用率

那么,为了收集有关进程的此类信息,您实际上不需要 Linux 上的分析器。

  1. 您可以在批处理模式下使用top。它以批处理模式运行,直到被终止或完成 N 次迭代:

    top -b -p `pidof a.out`
    

    top -b -p `pidof a.out` -n 100
    

    你会得到这个:

    $ top -b -p `pidof a.out`
    
    顶部 - 10:31:50 向上 12 天,19:08,5 位用户,平均负载:0.02、0.01、0.02
    任务:总共 1 个,0 个运行,1 个睡眠,0 个停止,0 个僵尸
    CPU:0.0%us、0.0%sy、0.0%ni、100.0%id、0.0%wa、0.0%hi、0.0%si、0.0%st
    内存:总共 16330584k,已用 2335024k,空闲 13995560k,241348k 缓冲区
    交换:总计 4194296k,已使用 0k,空闲 4194296k,缓存 1631880k
    
      PID 用户 PR NI VIRT RES SHR S %CPU %MEM TIME+ 命令
    24402 SK 20 0 98.7m 1056 860 S 43.9 0.0 0:11.87 输出
    
    
    顶部 - 10:31:53 向上 12 天,19:08,5 位用户,平均负载:0.02、0.01、0.02
    任务:总共 1 个,0 个运行,1 个睡眠,0 个停止,0 个僵尸
    CPU:0.9%us、3.7%sy、0.0%ni、95.5%id、0.0%wa、0.0%hi、0.0%si、0.0%st
    内存:总共 16330584k,已用 2335148k,空闲 13995436k,241348k 缓冲区
    交换:总计 4194296k,已使用 0k,空闲 4194296k,缓存 1631880k
    
    PID 用户 PR NI VIRT RES SHR S %CPU %MEM TIME+ 命令
    24402 SK 20 0 98.7m 1072 860 S 19.0 0.0 0:12.44 输出
    
  2. 您可以使用 ps (例如在 shell 脚本中)

    ps --format pid,pcpu,cputime,etime,size,vsz,cmd -p `pidof a.out`
    

    <小时>
    <块引用>

    我需要一些方法来记录 Linux 计算机上应用程序的性能

    如果您的 Linux 内核高于 2.6.32 或 perf 来执行此操作="noreferrer">OProfile(如果较旧)。这两个程序都不需要您来检测您的程序(例如 Gprof 需要)。但是,为了在 perf 中正确获取调用图,您需要使用 -fno-omit-frame-pointer 构建程序。例如:g++ -fno-omit-frame-pointer -O2 main.cpp

对于Linux perf

  1. 记录性能数据:

    perf record -p `pidof a.out`
    

    或录制 10 秒:

    perf record -p `pidof a.out` sleep 10
    

    或者用调用图()来记录

    perf record -g -p `pidof a.out`
    
  2. 分析记录的数据

    性能报告 --stdio
    性能报告 --stdio --sort=dso -g 无
    性能报告--stdio -g 无
    性能报告--stdio -g
    

    RHEL 6.3 上,允许读取 /boot/System.map-2.6.32 -279.el6.x86_64,所以我在做性能报告时通常会加上 --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64:

    性能报告 --stdio -g --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64
    

    <小时>
    在这里我写了一些关于使用 Linux `perf` 的更多信息:

    首先 - 这是有关使用 perf 进行 Linux 分析的教程

    如果您的 Linux 内核高于 2.6.32,则可以使用 perf;如果较旧,则可以使用 OProfile。这两个程序都不需要您来检测您的程序(就像 Gprof 所要求的那样)。但是,为了在 perf 中正确获取调用图,您需要使用 -fno-omit-frame-pointer 构建程序。例如:g++ -fno-omit-frame-pointer -O2 main.cpp

    您可以使用perf top查看应用程序的“实时”分析:

     sudo perf top -p `pidof a.out` -K
    

或者,您可以记录正在运行的应用程序的性能数据,然后对其进行分析:

  1. 要记录性能数据:

    perf record -p `pidof a.out`
    

    或录制 10 秒:

    perf record -p `pidof a.out` sleep 10
    

    或者用调用图()来记录

    perf record -g -p `pidof a.out`
    
  2. 分析记录的数据

perf report --stdio
perf report --stdio --sort=dso -g none
perf report --stdio -g none
perf report --stdio -g

或者,您可以记录应用程序的性能数据,然后通过以这种方式启动应用程序并等待其退出来分析它们:

perf record ./a.out

这是分析测试程序的示例。

测试程序位于文件 main.cpp 中(main.cpp 位于答案的底部):

我以这种方式编译它:

g++ -m64 -fno-omit-frame-pointer -g main.cpp -L.  -ltcmalloc_minimal -o my_test

我使用 libmalloc_minimial.so 因为它是用 -fno-omit-frame-pointer 编译的,而 libc malloc 似乎是在没有此选项的情况下编译的。然后我运行我的测试程序:

./my_test 100000000

然后我记录正在运行的进程的性能数据:

perf record -g  -p `pidof my_test` -o ./my_test.perf.data sleep 30

然后我分析每个模块的负载:

perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data

# Overhead  Command                 Shared Object
# ........  .......  ............................
#
    70.06%  my_test  my_test
    28.33%  my_test  libtcmalloc_minimal.so.0.1.0
     1.61%  my_test  [kernel.kallsyms]

然后分析每个函数的负载:

perf report --stdio -g none -i ./my_test.perf.data | c++filt

# Overhead  Command                 Shared Object                       Symbol
# ........  .......  ............................  ...........................
#
    29.30%  my_test  my_test                       [.] f2(long)
    29.14%  my_test  my_test                       [.] f1(long)
    15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)
    13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)
     9.44%  my_test  my_test                       [.] process_request(long)
     1.01%  my_test  my_test                       [.] operator delete(void*)@plt
     0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt
     0.20%  my_test  my_test                       [.] main
     0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt
     0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock
     0.13%  my_test  [kernel.kallsyms]             [k] native_write_msr_safe

     and so on ...

然后分析调用链:

perf report --stdio -g graph -i ./my_test.perf.data | c++filt

# Overhead  Command                 Shared Object                       Symbol
# ........  .......  ............................  ...........................
#
    29.30%  my_test  my_test                       [.] f2(long)
            |
            --- f2(long)
               |
                --29.01%-- process_request(long)
                          main
                          __libc_start_main

    29.14%  my_test  my_test                       [.] f1(long)
            |
            --- f1(long)
               |
               |--15.05%-- process_request(long)
               |          main
               |          __libc_start_main
               |
                --13.79%-- f2(long)
                          process_request(long)
                          main
                          __libc_start_main

    15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)
            |
            --- operator new(unsigned long)
               |
               |--11.44%-- f1(long)
               |          |
               |          |--5.75%-- process_request(long)
               |          |          main
               |          |          __libc_start_main
               |          |
               |           --5.69%-- f2(long)
               |                     process_request(long)
               |                     main
               |                     __libc_start_main
               |
                --3.01%-- process_request(long)
                          main
                          __libc_start_main

    13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)
            |
            --- operator delete(void*)
               |
               |--9.13%-- f1(long)
               |          |
               |          |--4.63%-- f2(long)
               |          |          process_request(long)
               |          |          main
               |          |          __libc_start_main
               |          |
               |           --4.51%-- process_request(long)
               |                     main
               |                     __libc_start_main
               |
               |--3.05%-- process_request(long)
               |          main
               |          __libc_start_main
               |
                --0.80%-- f2(long)
                          process_request(long)
                          main
                          __libc_start_main

     9.44%  my_test  my_test                       [.] process_request(long)
            |
            --- process_request(long)
               |
                --9.39%-- main
                          __libc_start_main

     1.01%  my_test  my_test                       [.] operator delete(void*)@plt
            |
            --- operator delete(void*)@plt

     0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt
            |
            --- operator new(unsigned long)@plt

     0.20%  my_test  my_test                       [.] main
     0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt
     0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock
     and so on ...

所以此时您知道您的程序在哪里花费了时间。

这是用于测试的 main.cpp 文件:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

time_t f1(time_t time_value)
{
  for (int j = 0; j < 10; ++j) {
    ++time_value;
    if (j%5 == 0) {
      double *p = new double;
      delete p;
    }
  }
  return time_value;
}

time_t f2(time_t time_value)
{
  for (int j = 0; j < 40; ++j) {
    ++time_value;
  }
  time_value = f1(time_value);
  return time_value;
}

time_t process_request(time_t time_value)
{
  for (int j = 0; j < 10; ++j) {
    int *p = new int;
    delete p;
    for (int m = 0; m < 10; ++m) {
      ++time_value;
    }
  }
  for (int i = 0; i < 10; ++i) {
    time_value = f1(time_value);
    time_value = f2(time_value);
  }
  return time_value;
}

int main(int argc, char* argv2[])
{
  int number_loops = argc > 1 ? atoi(argv2[1]) : 1;
  time_t time_value = time(0);
  printf("number loops %d\n", number_loops);
  printf("time_value: %d\n", time_value);

  for (int i = 0; i < number_loops; ++i) {
    time_value = process_request(time_value);
  }
  printf("time_value: %ld\n", time_value);
  return 0;
}

Ideally, I need an application that will attach to a process and log periodic snapshots of:

  • memory usage
  • number of threads
  • CPU usage

Well, in order to collect this type of information about your process, you don't actually need a profiler on Linux.

  1. You can use top in batch mode. It runs in the batch mode either until it is killed or until N iterations is done:

    top -b -p `pidof a.out`
    

    or

    top -b -p `pidof a.out` -n 100
    

    and you will get this:

    $ top -b -p `pidof a.out`
    
    top - 10:31:50 up 12 days, 19:08,  5 users,  load average: 0.02, 0.01, 0.02
    Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
    Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:  16330584k total,  2335024k used, 13995560k free,   241348k buffers
    Swap:  4194296k total,        0k used,  4194296k free,  1631880k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    24402 SK        20   0 98.7m 1056  860 S 43.9  0.0   0:11.87 a.out
    
    
    top - 10:31:53 up 12 days, 19:08,  5 users,  load average: 0.02, 0.01, 0.02
    Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
    Cpu(s):  0.9%us,  3.7%sy,  0.0%ni, 95.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:  16330584k total,  2335148k used, 13995436k free,   241348k buffers
    Swap:  4194296k total,        0k used,  4194296k free,  1631880k cached
    
    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    24402 SK      20   0 98.7m 1072  860 S 19.0  0.0   0:12.44 a.out
    
  2. You can use ps (for instance in a shell script)

    ps --format pid,pcpu,cputime,etime,size,vsz,cmd -p `pidof a.out`
    

    I need some means of recording the performance of an application on a Linux machine

    In order to do this you need to use perf if your Linux kernel is greater than 2.6.32 or OProfile if it is older. Both programs don't require from you to instrument your program (like Gprof requires). However, in order to get the call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp.

As for Linux perf:

  1. To record performance data:

    perf record -p `pidof a.out`
    

    or to record for 10 seconds:

    perf record -p `pidof a.out` sleep 10
    

    or to record with a call graph ()

    perf record -g -p `pidof a.out`
    
  2. To analyze the recorded data

    perf report --stdio
    perf report --stdio --sort=dso -g none
    perf report --stdio -g none
    perf report --stdio -g
    

    On RHEL 6.3 it is allowed to read /boot/System.map-2.6.32-279.el6.x86_64, so I usually add --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64 when doing a performance report:

    perf report --stdio -g --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64
    

    Here I wrote some more information on using Linux `perf`:

    First of all - this is tutorial about Linux profiling with perf

    You can use perf if your Linux Kernel is greater than 2.6.32 or OProfile if it is older. Both programs don't require from you to instrument your program (like Gprof requires). However, in order to get call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp.

    You can see a "live" analysis of your application with perf top:

     sudo perf top -p `pidof a.out` -K
    

Or you can record performance data of a running application and analyze them after that:

  1. To record performance data:

    perf record -p `pidof a.out`
    

    or to record for 10 seconds:

    perf record -p `pidof a.out` sleep 10
    

    or to record with a call graph ()

    perf record -g -p `pidof a.out`
    
  2. To analyze the recorded data

perf report --stdio
perf report --stdio --sort=dso -g none
perf report --stdio -g none
perf report --stdio -g

Or you can record performance data of an application and analyze them after that just by launching the application in this way and waiting for it to exit:

perf record ./a.out

This is an example of profiling a test program.

The test program is in file main.cpp (main.cpp is at the bottom of the answer):

I compile it in this way:

g++ -m64 -fno-omit-frame-pointer -g main.cpp -L.  -ltcmalloc_minimal -o my_test

I use libmalloc_minimial.so since it is compiled with -fno-omit-frame-pointer while libc malloc seems to be compiled without this option. Then I run my test program:

./my_test 100000000

Then I record performance data of a running process:

perf record -g  -p `pidof my_test` -o ./my_test.perf.data sleep 30

Then I analyze the load per module:

perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data

# Overhead  Command                 Shared Object
# ........  .......  ............................
#
    70.06%  my_test  my_test
    28.33%  my_test  libtcmalloc_minimal.so.0.1.0
     1.61%  my_test  [kernel.kallsyms]

Then load per function is analyzed:

perf report --stdio -g none -i ./my_test.perf.data | c++filt

# Overhead  Command                 Shared Object                       Symbol
# ........  .......  ............................  ...........................
#
    29.30%  my_test  my_test                       [.] f2(long)
    29.14%  my_test  my_test                       [.] f1(long)
    15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)
    13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)
     9.44%  my_test  my_test                       [.] process_request(long)
     1.01%  my_test  my_test                       [.] operator delete(void*)@plt
     0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt
     0.20%  my_test  my_test                       [.] main
     0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt
     0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock
     0.13%  my_test  [kernel.kallsyms]             [k] native_write_msr_safe

     and so on ...

Then call chains are analyzed:

perf report --stdio -g graph -i ./my_test.perf.data | c++filt

# Overhead  Command                 Shared Object                       Symbol
# ........  .......  ............................  ...........................
#
    29.30%  my_test  my_test                       [.] f2(long)
            |
            --- f2(long)
               |
                --29.01%-- process_request(long)
                          main
                          __libc_start_main

    29.14%  my_test  my_test                       [.] f1(long)
            |
            --- f1(long)
               |
               |--15.05%-- process_request(long)
               |          main
               |          __libc_start_main
               |
                --13.79%-- f2(long)
                          process_request(long)
                          main
                          __libc_start_main

    15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)
            |
            --- operator new(unsigned long)
               |
               |--11.44%-- f1(long)
               |          |
               |          |--5.75%-- process_request(long)
               |          |          main
               |          |          __libc_start_main
               |          |
               |           --5.69%-- f2(long)
               |                     process_request(long)
               |                     main
               |                     __libc_start_main
               |
                --3.01%-- process_request(long)
                          main
                          __libc_start_main

    13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)
            |
            --- operator delete(void*)
               |
               |--9.13%-- f1(long)
               |          |
               |          |--4.63%-- f2(long)
               |          |          process_request(long)
               |          |          main
               |          |          __libc_start_main
               |          |
               |           --4.51%-- process_request(long)
               |                     main
               |                     __libc_start_main
               |
               |--3.05%-- process_request(long)
               |          main
               |          __libc_start_main
               |
                --0.80%-- f2(long)
                          process_request(long)
                          main
                          __libc_start_main

     9.44%  my_test  my_test                       [.] process_request(long)
            |
            --- process_request(long)
               |
                --9.39%-- main
                          __libc_start_main

     1.01%  my_test  my_test                       [.] operator delete(void*)@plt
            |
            --- operator delete(void*)@plt

     0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt
            |
            --- operator new(unsigned long)@plt

     0.20%  my_test  my_test                       [.] main
     0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt
     0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock
     and so on ...

So at this point you know where your program spends time.

And this is the main.cpp file for the test:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

time_t f1(time_t time_value)
{
  for (int j = 0; j < 10; ++j) {
    ++time_value;
    if (j%5 == 0) {
      double *p = new double;
      delete p;
    }
  }
  return time_value;
}

time_t f2(time_t time_value)
{
  for (int j = 0; j < 40; ++j) {
    ++time_value;
  }
  time_value = f1(time_value);
  return time_value;
}

time_t process_request(time_t time_value)
{
  for (int j = 0; j < 10; ++j) {
    int *p = new int;
    delete p;
    for (int m = 0; m < 10; ++m) {
      ++time_value;
    }
  }
  for (int i = 0; i < 10; ++i) {
    time_value = f1(time_value);
    time_value = f2(time_value);
  }
  return time_value;
}

int main(int argc, char* argv2[])
{
  int number_loops = argc > 1 ? atoi(argv2[1]) : 1;
  time_t time_value = time(0);
  printf("number loops %d\n", number_loops);
  printf("time_value: %d\n", time_value);

  for (int i = 0; i < number_loops; ++i) {
    time_value = process_request(time_value);
  }
  printf("time_value: %ld\n", time_value);
  return 0;
}
美人骨 2024-08-27 23:23:36

引用 Linus Torvalds 本人:

不要使用 gprof。使用新的 Linux“perf”工具更好。

后来...

我几乎可以保证,一旦您开始使用它,您将永远不会再使用 gprof 或 oprofile。

请参阅回复:[PATCH] grep:不要执行外部 grep跳过工作树条目 (2010-01-04)

Quoting Linus Torvalds himself:

Don't use gprof. You're much better off using the newish Linux 'perf' tool.

And later ...

I can pretty much guarantee that once you start using it, you'll never use gprof or oprofile again.

See Re: [PATCH] grep: do not do external grep on skip-worktree entries (2010-01-04)

余厌 2024-08-27 23:23:36

如果您正在寻找可能加快程序速度的方法,您需要 堆栈截图。一个简单的方法是使用pstack实用程序,或者lsstack(如果可以的话)。

您可以比 Gprof 做得更好。如果您想使用官方分析工具,您需要能够对挂钟时间的调用堆栈进行采样并显示行级成本的工具,例如 OProfileRotateRight Zoom

If you are looking for things to do to possibly speed up the program, you need stackshots. A simple way to do this is to use the pstack utility, or lsstack if you can get it.

You can do better than Gprof. If you want to use an official profiling tool, you want something that samples the call stack on wall-clock time and presents line-level cost, such as OProfile or RotateRight Zoom.

你与昨日 2024-08-27 23:23:36

您可以使用 Valgrind。它将数据记录在一个文件中,您可以稍后使用适当的 GUI 进行分析,例如 KCacheGrind

一个用法示例是:

valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes your_program

它将生成一个名为 callgrind.out.xxx 的文件,其中 xxx 是 PID

Gprof 不同,Valgrind 适用于许多不同的语言,包括 Java、有一些限制

You can use Valgrind. It records data in a file which you can analyse later using a proper GUI, like KCacheGrind.

A usage example would be:

valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes your_program

It'll generate a file called callgrind.out.xxx where xxx is the PID of the program.

Unlike Gprof, Valgrind works with many different languages, including Java, with some limitations.

梦幻的心爱 2024-08-27 23:23:36

查看Gprof。您需要使用 -pg 选项来编译代码,该选项会检测代码。之后,您可以运行该程序并使用Gprof查看结果。

Look into Gprof. You need to compile the code with the -pg option, which instruments the code. After that, you can run the program and use Gprof to see the results.

昔日梦未散 2024-08-27 23:23:36

您还可以尝试 cpuprofiler.com。它获取您通常从 top 获取的信息 > 命令,甚至可以通过网络浏览器远程查看 CPU 使用率数据。

You can also try out cpuprofiler.com. It gets the information you would normally get from the top command, and the CPU usage data can be even viewed remotely from a web browser.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文