当前位置：文江博客话题详情

Linux profiling

Linux 应用程序分析

发布于 2024-08-20 23:23:36 字数 142 浏览 4 评论 0 原文

如何记录 Linux 机器上应用程序的性能？我不会有 IDE。

理想情况下，我需要一个应用程序，该应用程序将附加到进程并记录以下定期快照：

内存使用情况
线程数量
CPU 使用情况

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

別甾虛僞 2024-08-27 23:23:36

理想情况下，我需要一个应用程序来附加到进程并记录以下内容的定期快照：

内存使用情况

线程数

CPU 使用率

那么，为了收集有关进程的此类信息，您实际上不需要 Linux 上的分析器。

您可以在批处理模式下使用top。它以批处理模式运行，直到被终止或完成 N 次迭代：

top -b -p `pidof a.out`

或

top -b -p `pidof a.out` -n 100

你会得到这个：

$ top -b -p `pidof a.out`

顶部 - 10:31:50 向上 12 天，19:08，5 位用户，平均负载：0.02、0.01、0.02
任务：总共 1 个，0 个运行，1 个睡眠，0 个停止，0 个僵尸
CPU：0.0%us、0.0%sy、0.0%ni、100.0%id、0.0%wa、0.0%hi、0.0%si、0.0%st
内存：总共 16330584k，已用 2335024k，空闲 13995560k，241348k 缓冲区
交换：总计 4194296k，已使用 0k，空闲 4194296k，缓存 1631880k

  PID 用户 PR NI VIRT RES SHR S %CPU %MEM TIME+ 命令
24402 SK 20 0 98.7m 1056 860 S 43.9 0.0 0:11.87 输出


顶部 - 10:31:53 向上 12 天，19:08，5 位用户，平均负载：0.02、0.01、0.02
任务：总共 1 个，0 个运行，1 个睡眠，0 个停止，0 个僵尸
CPU：0.9%us、3.7%sy、0.0%ni、95.5%id、0.0%wa、0.0%hi、0.0%si、0.0%st
内存：总共 16330584k，已用 2335148k，空闲 13995436k，241348k 缓冲区
交换：总计 4194296k，已使用 0k，空闲 4194296k，缓存 1631880k

PID 用户 PR NI VIRT RES SHR S %CPU %MEM TIME+ 命令
24402 SK 20 0 98.7m 1072 860 S 19.0 0.0 0:12.44 输出

您可以使用 ps （例如在 shell 脚本中）
```
ps --format pid,pcpu,cputime,etime,size,vsz,cmd -p `pidof a.out`
```
<小时>
<块引用>

我需要一些方法来记录 Linux 计算机上应用程序的性能

如果您的 Linux 内核高于 2.6.32 或 perf 来执行此操作="noreferrer">OProfile（如果较旧）。这两个程序都不需要您来检测您的程序（例如 Gprof 需要）。但是，为了在 perf 中正确获取调用图，您需要使用 -fno-omit-frame-pointer 构建程序。例如：g++ -fno-omit-frame-pointer -O2 main.cpp。

对于Linux perf：

记录性能数据：

perf record -p `pidof a.out`

或录制 10 秒：

perf record -p `pidof a.out` sleep 10

或者用调用图()来记录

perf record -g -p `pidof a.out`

分析记录的数据
```
性能报告 --stdio
性能报告 --stdio --sort=dso -g 无
性能报告--stdio -g 无
性能报告--stdio -g
```
在 RHEL 6.3 上，允许读取 /boot/System.map-2.6.32 -279.el6.x86_64，所以我在做性能报告时通常会加上 --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64：
```
性能报告 --stdio -g --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64
```
<小时>
在这里我写了一些关于使用 Linux `perf` 的更多信息：

首先 - 这是有关使用 perf 进行 Linux 分析的教程

如果您的 Linux 内核高于 2.6.32，则可以使用 perf；如果较旧，则可以使用 OProfile。这两个程序都不需要您来检测您的程序（就像 Gprof 所要求的那样）。但是，为了在 perf 中正确获取调用图，您需要使用 -fno-omit-frame-pointer 构建程序。例如：g++ -fno-omit-frame-pointer -O2 main.cpp。

您可以使用perf top查看应用程序的“实时”分析：
```
 sudo perf top -p `pidof a.out` -K
```

：

或者，您可以记录正在运行的应用程序的性能数据，然后对其进行分析：

要记录性能数据：

perf record -p `pidof a.out`

或录制 10 秒：

perf record -p `pidof a.out` sleep 10

或者用调用图()来记录

perf record -g -p `pidof a.out`

分析记录的数据

perf report --stdio
perf report --stdio --sort=dso -g none
perf report --stdio -g none
perf report --stdio -g

或者，您可以记录应用程序的性能数据，然后通过以这种方式启动应用程序并等待其退出来分析它们：

perf record ./a.out

这是分析测试程序的示例。

测试程序位于文件 main.cpp 中（main.cpp 位于答案的底部）：

我以这种方式编译它：

g++ -m64 -fno-omit-frame-pointer -g main.cpp -L.  -ltcmalloc_minimal -o my_test

我使用 libmalloc_minimial.so 因为它是用 -fno-omit-frame-pointer 编译的，而 libc malloc 似乎是在没有此选项的情况下编译的。然后我运行我的测试程序：

./my_test 100000000

然后我记录正在运行的进程的性能数据：

perf record -g  -p `pidof my_test` -o ./my_test.perf.data sleep 30

然后我分析每个模块的负载：

perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data

# Overhead  Command                 Shared Object
# ........  .......  ............................
#
    70.06%  my_test  my_test
    28.33%  my_test  libtcmalloc_minimal.so.0.1.0
     1.61%  my_test  [kernel.kallsyms]

然后分析每个函数的负载：

perf report --stdio -g none -i ./my_test.perf.data | c++filt

# Overhead  Command                 Shared Object                       Symbol
# ........  .......  ............................  ...........................
#
    29.30%  my_test  my_test                       [.] f2(long)
    29.14%  my_test  my_test                       [.] f1(long)
    15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)
    13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)
     9.44%  my_test  my_test                       [.] process_request(long)
     1.01%  my_test  my_test                       [.] operator delete(void*)@plt
     0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt
     0.20%  my_test  my_test                       [.] main
     0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt
     0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock
     0.13%  my_test  [kernel.kallsyms]             [k] native_write_msr_safe

     and so on ...

然后分析调用链：

perf report --stdio -g graph -i ./my_test.perf.data | c++filt

# Overhead  Command                 Shared Object                       Symbol
# ........  .......  ............................  ...........................
#
    29.30%  my_test  my_test                       [.] f2(long)
            |
            --- f2(long)
               |
                --29.01%-- process_request(long)
                          main
                          __libc_start_main

    29.14%  my_test  my_test                       [.] f1(long)
            |
            --- f1(long)
               |
               |--15.05%-- process_request(long)
               |          main
               |          __libc_start_main
               |
                --13.79%-- f2(long)
                          process_request(long)
                          main
                          __libc_start_main

    15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)
            |
            --- operator new(unsigned long)
               |
               |--11.44%-- f1(long)
               |          |
               |          |--5.75%-- process_request(long)
               |          |          main
               |          |          __libc_start_main
               |          |
               |           --5.69%-- f2(long)
               |                     process_request(long)
               |                     main
               |                     __libc_start_main
               |
                --3.01%-- process_request(long)
                          main
                          __libc_start_main

    13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)
            |
            --- operator delete(void*)
               |
               |--9.13%-- f1(long)
               |          |
               |          |--4.63%-- f2(long)
               |          |          process_request(long)
               |          |          main
               |          |          __libc_start_main
               |          |
               |           --4.51%-- process_request(long)
               |                     main
               |                     __libc_start_main
               |
               |--3.05%-- process_request(long)
               |          main
               |          __libc_start_main
               |
                --0.80%-- f2(long)
                          process_request(long)
                          main
                          __libc_start_main

     9.44%  my_test  my_test                       [.] process_request(long)
            |
            --- process_request(long)
               |
                --9.39%-- main
                          __libc_start_main

     1.01%  my_test  my_test                       [.] operator delete(void*)@plt
            |
            --- operator delete(void*)@plt

     0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt
            |
            --- operator new(unsigned long)@plt

     0.20%  my_test  my_test                       [.] main
     0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt
     0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock
     and so on ...

所以此时您知道您的程序在哪里花费了时间。

这是用于测试的 main.cpp 文件：

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

time_t f1(time_t time_value)
{
  for (int j = 0; j < 10; ++j) {
    ++time_value;
    if (j%5 == 0) {
      double *p = new double;
      delete p;
    }
  }
  return time_value;
}

time_t f2(time_t time_value)
{
  for (int j = 0; j < 40; ++j) {
    ++time_value;
  }
  time_value = f1(time_value);
  return time_value;
}

time_t process_request(time_t time_value)
{
  for (int j = 0; j < 10; ++j) {
    int *p = new int;
    delete p;
    for (int m = 0; m < 10; ++m) {
      ++time_value;
    }
  }
  for (int i = 0; i < 10; ++i) {
    time_value = f1(time_value);
    time_value = f2(time_value);
  }
  return time_value;
}

int main(int argc, char* argv2[])
{
  int number_loops = argc > 1 ? atoi(argv2[1]) : 1;
  time_t time_value = time(0);
  printf("number loops %d\n", number_loops);
  printf("time_value: %d\n", time_value);

  for (int i = 0; i < number_loops; ++i) {
    time_value = process_request(time_value);
  }
  printf("time_value: %ld\n", time_value);
  return 0;
}

Ideally, I need an application that will attach to a process and log periodic snapshots of:

memory usage

number of threads

CPU usage

Well, in order to collect this type of information about your process, you don't actually need a profiler on Linux.

You can use top in batch mode. It runs in the batch mode either until it is killed or until N iterations is done:

top -b -p `pidof a.out`

top -b -p `pidof a.out` -n 100

and you will get this:

$ top -b -p `pidof a.out`

top - 10:31:50 up 12 days, 19:08,  5 users,  load average: 0.02, 0.01, 0.02
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16330584k total,  2335024k used, 13995560k free,   241348k buffers
Swap:  4194296k total,        0k used,  4194296k free,  1631880k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24402 SK        20   0 98.7m 1056  860 S 43.9  0.0   0:11.87 a.out


top - 10:31:53 up 12 days, 19:08,  5 users,  load average: 0.02, 0.01, 0.02
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  3.7%sy,  0.0%ni, 95.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16330584k total,  2335148k used, 13995436k free,   241348k buffers
Swap:  4194296k total,        0k used,  4194296k free,  1631880k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24402 SK      20   0 98.7m 1072  860 S 19.0  0.0   0:12.44 a.out

You can use ps (for instance in a shell script)
```
ps --format pid,pcpu,cputime,etime,size,vsz,cmd -p `pidof a.out`
```
I need some means of recording the performance of an application on a Linux machine

In order to do this you need to use perf if your Linux kernel is greater than 2.6.32 or OProfile if it is older. Both programs don't require from you to instrument your program (like Gprof requires). However, in order to get the call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp.

As for Linux perf:

To record performance data:

perf record -p `pidof a.out`

or to record for 10 seconds:

perf record -p `pidof a.out` sleep 10

or to record with a call graph ()

perf record -g -p `pidof a.out`

To analyze the recorded data
```
perf report --stdio
perf report --stdio --sort=dso -g none
perf report --stdio -g none
perf report --stdio -g
```
On RHEL 6.3 it is allowed to read /boot/System.map-2.6.32-279.el6.x86_64, so I usually add --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64 when doing a performance report:
```
perf report --stdio -g --kallsyms=/boot/System.map-2.6.32-279.el6.x86_64
```
Here I wrote some more information on using Linux `perf`:

First of all - this is tutorial about Linux profiling with perf

You can use perf if your Linux Kernel is greater than 2.6.32 or OProfile if it is older. Both programs don't require from you to instrument your program (like Gprof requires). However, in order to get call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp.

You can see a "live" analysis of your application with perf top:
```
 sudo perf top -p `pidof a.out` -K
```

Or you can record performance data of a running application and analyze them after that:

To record performance data:

perf record -p `pidof a.out`

or to record for 10 seconds:

perf record -p `pidof a.out` sleep 10

or to record with a call graph ()

perf record -g -p `pidof a.out`

To analyze the recorded data

perf report --stdio
perf report --stdio --sort=dso -g none
perf report --stdio -g none
perf report --stdio -g

Or you can record performance data of an application and analyze them after that just by launching the application in this way and waiting for it to exit:

perf record ./a.out

This is an example of profiling a test program.

The test program is in file main.cpp (main.cpp is at the bottom of the answer):

I compile it in this way:

g++ -m64 -fno-omit-frame-pointer -g main.cpp -L.  -ltcmalloc_minimal -o my_test

I use libmalloc_minimial.so since it is compiled with -fno-omit-frame-pointer while libc malloc seems to be compiled without this option. Then I run my test program:

./my_test 100000000

Then I record performance data of a running process:

perf record -g  -p `pidof my_test` -o ./my_test.perf.data sleep 30

Then I analyze the load per module:

perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data

# Overhead  Command                 Shared Object
# ........  .......  ............................
#
    70.06%  my_test  my_test
    28.33%  my_test  libtcmalloc_minimal.so.0.1.0
     1.61%  my_test  [kernel.kallsyms]

Then load per function is analyzed:

perf report --stdio -g none -i ./my_test.perf.data | c++filt

# Overhead  Command                 Shared Object                       Symbol
# ........  .......  ............................  ...........................
#
    29.30%  my_test  my_test                       [.] f2(long)
    29.14%  my_test  my_test                       [.] f1(long)
    15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)
    13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)
     9.44%  my_test  my_test                       [.] process_request(long)
     1.01%  my_test  my_test                       [.] operator delete(void*)@plt
     0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt
     0.20%  my_test  my_test                       [.] main
     0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt
     0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock
     0.13%  my_test  [kernel.kallsyms]             [k] native_write_msr_safe

     and so on ...

Then call chains are analyzed:

perf report --stdio -g graph -i ./my_test.perf.data | c++filt

# Overhead  Command                 Shared Object                       Symbol
# ........  .......  ............................  ...........................
#
    29.30%  my_test  my_test                       [.] f2(long)
            |
            --- f2(long)
               |
                --29.01%-- process_request(long)
                          main
                          __libc_start_main

    29.14%  my_test  my_test                       [.] f1(long)
            |
            --- f1(long)
               |
               |--15.05%-- process_request(long)
               |          main
               |          __libc_start_main
               |
                --13.79%-- f2(long)
                          process_request(long)
                          main
                          __libc_start_main

    15.17%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator new(unsigned long)
            |
            --- operator new(unsigned long)
               |
               |--11.44%-- f1(long)
               |          |
               |          |--5.75%-- process_request(long)
               |          |          main
               |          |          __libc_start_main
               |          |
               |           --5.69%-- f2(long)
               |                     process_request(long)
               |                     main
               |                     __libc_start_main
               |
                --3.01%-- process_request(long)
                          main
                          __libc_start_main

    13.16%  my_test  libtcmalloc_minimal.so.0.1.0  [.] operator delete(void*)
            |
            --- operator delete(void*)
               |
               |--9.13%-- f1(long)
               |          |
               |          |--4.63%-- f2(long)
               |          |          process_request(long)
               |          |          main
               |          |          __libc_start_main
               |          |
               |           --4.51%-- process_request(long)
               |                     main
               |                     __libc_start_main
               |
               |--3.05%-- process_request(long)
               |          main
               |          __libc_start_main
               |
                --0.80%-- f2(long)
                          process_request(long)
                          main
                          __libc_start_main

     9.44%  my_test  my_test                       [.] process_request(long)
            |
            --- process_request(long)
               |
                --9.39%-- main
                          __libc_start_main

     1.01%  my_test  my_test                       [.] operator delete(void*)@plt
            |
            --- operator delete(void*)@plt

     0.97%  my_test  my_test                       [.] operator new(unsigned long)@plt
            |
            --- operator new(unsigned long)@plt

     0.20%  my_test  my_test                       [.] main
     0.19%  my_test  [kernel.kallsyms]             [k] apic_timer_interrupt
     0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock
     and so on ...

So at this point you know where your program spends time.

And this is the main.cpp file for the test:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

time_t f1(time_t time_value)
{
  for (int j = 0; j < 10; ++j) {
    ++time_value;
    if (j%5 == 0) {
      double *p = new double;
      delete p;
    }
  }
  return time_value;
}

time_t f2(time_t time_value)
{
  for (int j = 0; j < 40; ++j) {
    ++time_value;
  }
  time_value = f1(time_value);
  return time_value;
}

time_t process_request(time_t time_value)
{
  for (int j = 0; j < 10; ++j) {
    int *p = new int;
    delete p;
    for (int m = 0; m < 10; ++m) {
      ++time_value;
    }
  }
  for (int i = 0; i < 10; ++i) {
    time_value = f1(time_value);
    time_value = f2(time_value);
  }
  return time_value;
}

int main(int argc, char* argv2[])
{
  int number_loops = argc > 1 ? atoi(argv2[1]) : 1;
  time_t time_value = time(0);
  printf("number loops %d\n", number_loops);
  printf("time_value: %d\n", time_value);

  for (int i = 0; i < number_loops; ++i) {
    time_value = process_request(time_value);
  }
  printf("time_value: %ld\n", time_value);
  return 0;
}

回复收藏 0 原文

美人骨 2024-08-27 23:23:36

引用 Linus Torvalds 本人：

不要使用 gprof。使用新的 Linux“perf”工具会更好。

后来...

我几乎可以保证，一旦您开始使用它，您将永远不会再使用 gprof 或 oprofile。

请参阅回复：[PATCH] grep：不要执行外部 grep跳过工作树条目 (2010-01-04)

回复收藏 0 原文

余厌 2024-08-27 23:23:36

如果您正在寻找可能加快程序速度的方法，您需要堆栈截图。一个简单的方法是使用pstack实用程序，或者lsstack（如果可以的话）。

您可以比 Gprof 做得更好。如果您想使用官方分析工具，您需要能够对挂钟时间的调用堆栈进行采样并显示行级成本的工具，例如 OProfile 或 RotateRight Zoom。

回复收藏 0 原文

你与昨日 2024-08-27 23:23:36

您可以使用 Valgrind。它将数据记录在一个文件中，您可以稍后使用适当的 GUI 进行分析，例如 KCacheGrind 。

一个用法示例是：

valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes your_program

它将生成一个名为 callgrind.out.xxx 的文件，其中 xxx 是 PID。

与 Gprof 不同，Valgrind 适用于许多不同的语言，包括 Java、有一些限制。

You can use Valgrind. It records data in a file which you can analyse later using a proper GUI, like KCacheGrind.

A usage example would be:

valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes your_program

It'll generate a file called callgrind.out.xxx where xxx is the PID of the program.

Unlike Gprof, Valgrind works with many different languages, including Java, with some limitations.

回复收藏 0 原文

梦幻的心爱 2024-08-27 23:23:36

查看Gprof。您需要使用 -pg 选项来编译代码，该选项会检测代码。之后，您可以运行该程序并使用Gprof查看结果。

回复收藏 0 原文

昔日梦未散 2024-08-27 23:23:36

您还可以尝试 cpuprofiler.com。它获取您通常从 top 获取的信息 > 命令，甚至可以通过网络浏览器远程查看 CPU 使用率数据。

回复收藏 0 原文

~没有更多了~

关于作者

℡Ms空城旧梦

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

Linux 应用程序分析

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

Linux 应用程序分析

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。