返回介绍

Review of GPU Architechture - A Simplification

发布于 2025-02-25 23:44:05 字数 1683 浏览 0 评论 0 收藏 0

Memory

GPUs or GPGPUs are complex devices, but to get started, one really just needs to understand a more simplistic view.

GPUs and CPUs

The most important thing to understand about memory, is that the CPU can access both main memory (host) and GPU memory (device). The device sees only its memory, and cannot access the host memory.

Kernels, Threads and Blocks

Recall that GPUs are SIMD. This means that each CUDA core gets the same code, called a ‘kernel’. Kernels are programmed to execute one ‘thread’ (execution unit or task). The ‘trick’ is that each thread ‘knows’ its identity, in the form of a grid location, and is usually coded to access an array of data at a unique location for the thread.

We will concentrate on a 1-dimensional grid with each thread in a block by itself, but let’s understand when we might want to organize threads into blocks.

GPU memory can be expanded (roughly) into 3 types:

  • local - memory only seen by the thread. This is the fastest type
  • shared - memory that may be seen by all threads in a block. Fast memory, but not as fast as local.
  • global - memory seen by all threads in all blocks. This is the slowest to access.

So, if multiple threads need to use the same data (not unique chunks of an array, but the very same data), then those threads should be grouped into a common block, and the data should be stored in shared memory.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文