如何介绍在CUDA内核发布过程中执行了多少个说明

发布于 2025-01-24 00:05:02 字数 81 浏览 0 评论 0 原文

我想知道在发布过程中在CUDA内核中执行了多少个FP32和INT32说明。有什么方法可以通过Nvidia Nsight Compute对其进行介绍?

I want to know how many fp32 and int32 instructions are executed in a cuda kernel during a launch. Is there any way to profile it via Nvidia Nsight Compute?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

后来的我们 2025-01-31 00:05:02

是否有任何方法可以通过Nvidia Nsight Compute进行介绍?

对于Nsight Compute,

fp32 instructions executed:    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum
integer instructions executed: smsp__sass_thread_inst_executed_op_integer_pred_on.sum

示例:

$ ncu --metrics smsp__sass_thread_inst_executed_op_fp32_pred_on.sum ./t2003
...
==PROF== Disconnected from process 27520
[27520] [email protected]
  kernel_1(const float *, float *), 2022-Apr-23 23:24:34, Context 1, Stream 16
    Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum                               inst                         10,240
    ---------------------------------------------------------------------- --------------- ------------------------------

  kernel_2(const float *, float *), 2022-Apr-23 23:24:34, Context 1, Stream 17
    Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum                               inst                         10,240
    ---------------------------------------------------------------------- --------------- ------------------------------

$

请注意,Nsight Compute的最新版本为较新的(计算能力7.0及更高)GPU。

Is there any way to profile it via Nvidia Nsight Compute?

For nsight compute, the relevant metrics are as follows:

fp32 instructions executed:    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum
integer instructions executed: smsp__sass_thread_inst_executed_op_integer_pred_on.sum

Example:

$ ncu --metrics smsp__sass_thread_inst_executed_op_fp32_pred_on.sum ./t2003
...
==PROF== Disconnected from process 27520
[27520] [email protected]
  kernel_1(const float *, float *), 2022-Apr-23 23:24:34, Context 1, Stream 16
    Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum                               inst                         10,240
    ---------------------------------------------------------------------- --------------- ------------------------------

  kernel_2(const float *, float *), 2022-Apr-23 23:24:34, Context 1, Stream 17
    Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    smsp__sass_thread_inst_executed_op_fp32_pred_on.sum                               inst                         10,240
    ---------------------------------------------------------------------- --------------- ------------------------------

$

Note that recent versions of Nsight Compute are intended to be used on Volta and newer (compute capability 7.0 and higher) GPUs only.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文