cuda 跟踪仿真 - 需要一些专家的见解

发布于 2024-12-09 23:17:26 字数 885 浏览 0 评论 0原文

作为研究生院研究工作的一部分，我正在 Windows 中开发 GPU 跟踪仿真工具。具体来说，我正在研究 cuda 运行时跟踪仿真。

我使用 MS Detours 进行简单的 DLL 注入来拦截 cuda 运行时 API。我将 API 调用及其参数存储在跟踪文件中。我在尝试从跟踪文件模拟 API 时遇到了一些问题（我使用“回放”一词来表示此操作）

典型的跟踪文件首先调用 __cudaRegisterFatBinary 和 __cudaRegisterFunction。接下来是对 cudaMalloc 的调用。

我做了什么？

1）我遇到了著名的GPUOcelot，我发现了Nvidia现在正在使用的cubin结构。我使用它在拦截模式下保存 cudaRegisterFatBinary 的地址参数，并通过重新填充内存中的结构来使用 _cudaRegisterFatBinary 播放中的指针。

2）在_cudaRegisterFunction中，我不确定参数hostFunction、Device Function和Device Name指的是什么。我的意思是我不明白如何在从跟踪文件回放时填充它。我只是保存原始执行中的指针并用它来模拟调用。但无法知道该函数是否正常运行，因为它没有返回值。

3）这两个入口点函数之后的cudaMalloc返回cuda错误代码11。根据Nvidia文档，这是cuda无效值。我不知道为什么会这样。我假设前两个函数调用有问题。我也有一种感觉，cuda 运行时创建隐式主上下文有问题。有人可以给我一些关于 cuda 运行时执行的见解，并指出我可能缺少什么吗？

我知道它有大量信息，但没有任何有用的代码。我不知道要在这里发布哪部分代码。当人们开始对我的问题感兴趣并询问我有关我的项目的具体事情时，我就会这样做。最初我只是希望我错过了一些你们能发现的大而高水平的东西。

我非常感谢您的时间和兴趣！

原文

I am working on a gpu trace emulation tool in windows as part of my research work in grad school . I am working on cuda runtime trace emulation to be specific.

I use simple DLL injection using MS Detours to enable interception of the cuda runtime APIs. I store the API calls and their parameters in a trace file. I get into some problems while trying to emulate the API from my trace file(I use the word playback to denote this action)

A typical trace file begins by making calls to __cudaRegisterFatBinary and __cudaRegisterFunction. This is followed by a call to cudaMalloc.

What I did?

1) I came across the famous GPUOcelot and I found the cubin structure that Nvidia is using right now. I am using that to save the address parameter of cudaRegisterFatBinary in intercept mode and I am using the pointer in the playback for _cudaRegisterFatBinary by repopulating the structure in the memory.

2)In _cudaRegisterFunction I am not sure what the parameters hostFunction,Device Function and Device Name refer to. I mean I don't understand how I could populate it while playing back from my trace file. I am just saving the pointer from the original execution and using it to imitate the call. But there is no way of knowing whether the function goes through fine since it does not have a return value.

3)cudaMalloc following these two entry point functions return cuda error code 11. It is cuda invalid value according to the Nvidia documentation. I have no idea why this should be the case. I am assuming that something is wrong with the previous two function calls. I also have a feeling that something is wrong with implicit primary context creation by the cuda runtime. Can someone give me some insights about cuda runtime execution and point me to what might I be missing?

I know its a ton of information without any useful code. I dont know which part of the code to post here. I will do it when people start taking interest in my question and ask me specific things about my project. Initially am just hoping that I am missing something big and high level that one of you can spot.

I greatly appreciate your time and interest!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情绪操控生活 2024-12-16 23:17:26

整体听起来很有趣。您的“错误：Cuda 无效值”可能与 _cudaRegisterFunction 的参数有关。参数“DeviceName”听起来像是标识要使用哪个 GPU（卡？）。检查 CUDA SDK，有很多枚举系统上 GPU 的演示，也许这些值对“DeviceName”有效。至于“hostFunction”和“deviceFunction”，这些听起来像是函数ID，或者可能是函数指针。另外，您可以调用“cudaGetLastError()”来测试函数调用是否成功（如果一切正常，它会返回“cudaSuccess”...查看 sdk 中的错误日志宏）。祝你好运！

回复收藏 0 原文

~没有更多了~