使用 nvcc 时 arch 和 code 选项的默认值是什么?
编译 CUDA 代码时,您必须选择为哪种架构生成代码。 nvcc
提供了两个参数来指定该架构,基本上:
arch
指定虚拟架构,可以是compute_10
、compute_11
所以code
指定真实的架构,可以是sm_10
、sm_11
等。
像这样的命令:
nvcc x.cu -arch=compute_13 -code=sm_13
将生成 'cubin ' 具有 1.3 计算能力的设备的代码。如果我错了,请纠正我。我想知道这两个参数的默认值是什么? 当没有指定 arch
或 code
值时,nvcc使用哪种默认架构?< /强>
When compiling your CUDA code, you have to select for which architecture your code is being generated. nvcc
provides two parameters to specify this architecture, basically:
arch
specifies the virtual arquictecture, which can becompute_10
,compute_11
, etc.code
specifies the real architecture, which can besm_10
,sm_11
, etc.
So a command like this:
nvcc x.cu -arch=compute_13 -code=sm_13
Will generate 'cubin' code for devices with 1.3 compute capability. Please correct me if I'm wrong. Which I would like to know is which are the default values for these two parameters? Which is the default architecture that nvcc uses when no value for arch
or code
is specified?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
好吧,我终于找到了默认值。我的错误是没有从头到尾阅读 NVCC 文档中有关 GPU 编译的整个章节。因此,
相当于
这些是默认值。默认情况下会对虚拟架构
compute_10
执行编译,编译结果的a.out
将包含sm_10
的 CUBIN 代码> 真实架构,以及compute_10
架构的 PTX 汇编代码,如果您的架构大于sm_10
,CUDA 驱动程序将“及时”重新编译该代码。Ok, I've finally managed to discover the default values. My fault for not reading the whole chapter on GPU compilation in the NVCC documentation from the beginning to the very very end. So,
is equivalent for
Those are the default values. The compilation is performed by default to the virtual architecture
compute_10
, and thea.out
that results from the compilation will include the CUBIN code for thesm_10
real architecture, and the PTX assembly code for thecompute_10
architecture, which will be recompiled 'just in time' by the CUDA driver if your architecture is greater thansm_10
.我相信默认值为
compute_10
,因为除非您明确指定这就是您想要的,否则您无法使用任何compute_13 功能。 (大概是CUDA工具包附带的NVCC文档指定了,但我在网上找不到链接)。I believe the default is
compute_10
, as you cannot use any compute_13 features unless you specify explicitly that that's what you want. (Presumably the NVCC documentation that comes with the CUDA toolkit specifies, but I can't find a link online).