以编程方式获取缓存行大小?

发布于 2024-07-18 06:30:28 字数 118 浏览 8 评论 0 原文

欢迎所有平台,请注明您的回答平台。

类似的问题:如何在 C++ 中以编程方式获取 CPU 缓存页面大小?

All platforms welcome, please specify the platform for your answer.

A similar question: How to programmatically get the CPU cache page size in C++?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

鸢与 2024-07-25 06:30:28

在 Linux(具有相当新的内核)上,您可以从 /sys 获取此信息:

/sys/devices/system/cpu/cpu0/cache/

此目录针对每个级别的缓存都有一个子目录。 每个目录都包含以下文件:

coherency_line_size
level
number_of_sets
physical_line_partition
shared_cpu_list
shared_cpu_map
size
type
ways_of_associativity

这为您提供了您希望知道的有关缓存的更多信息,包括缓存行大小 (coherency_line_size) 以及哪些 CPU 共享此缓存。 如果您正在使用共享数据进行多线程编程,这非常有用(如果共享数据的线程也共享缓存,您将获得更好的结果)。

On Linux (with a reasonably recent kernel), you can get this information out of /sys:

/sys/devices/system/cpu/cpu0/cache/

This directory has a subdirectory for each level of cache. Each of those directories contains the following files:

coherency_line_size
level
number_of_sets
physical_line_partition
shared_cpu_list
shared_cpu_map
size
type
ways_of_associativity

This gives you more information about the cache then you'd ever hope to know, including the cacheline size (coherency_line_size) as well as what CPUs share this cache. This is very useful if you are doing multithreaded programming with shared data (you'll get better results if the threads sharing data are also sharing a cache).

辞慾 2024-07-25 06:30:28

在 Linux 上查看 sysconf(3)。

sysconf (_SC_LEVEL1_DCACHE_LINESIZE)

您还可以使用 getconf 从命令行获取它:

$ getconf LEVEL1_DCACHE_LINESIZE
64

On Linux look at sysconf(3).

sysconf (_SC_LEVEL1_DCACHE_LINESIZE)

You can also get it from the command line using getconf:

$ getconf LEVEL1_DCACHE_LINESIZE
64
一口甜 2024-07-25 06:30:28

我一直在研究一些缓存行的东西,需要编写一个跨平台函数。 我将其提交到 https://github.com/NickStrupat/CacheLineSize 的 github 存储库,或者您也可以使用下面的源。 随意用它做任何你想做的事。

#ifndef GET_CACHE_LINE_SIZE_H_INCLUDED
#define GET_CACHE_LINE_SIZE_H_INCLUDED

// Author: Nick Strupat
// Date: October 29, 2010
// Returns the cache line size (in bytes) of the processor, or 0 on failure

#include <stddef.h>
size_t cache_line_size();

#if defined(__APPLE__)

#include <sys/sysctl.h>
size_t cache_line_size() {
    size_t line_size = 0;
    size_t sizeof_line_size = sizeof(line_size);
    sysctlbyname("hw.cachelinesize", &line_size, &sizeof_line_size, 0, 0);
    return line_size;
}

#elif defined(_WIN32)

#include <stdlib.h>
#include <windows.h>
size_t cache_line_size() {
    size_t line_size = 0;
    DWORD buffer_size = 0;
    DWORD i = 0;
    SYSTEM_LOGICAL_PROCESSOR_INFORMATION * buffer = 0;

    GetLogicalProcessorInformation(0, &buffer_size);
    buffer = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION *)malloc(buffer_size);
    GetLogicalProcessorInformation(&buffer[0], &buffer_size);

    for (i = 0; i != buffer_size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
        if (buffer[i].Relationship == RelationCache && buffer[i].Cache.Level == 1) {
            line_size = buffer[i].Cache.LineSize;
            break;
        }
    }

    free(buffer);
    return line_size;
}

#elif defined(linux)

#include <stdio.h>
size_t cache_line_size() {
    FILE * p = 0;
    p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
    unsigned int i = 0;
    if (p) {
        fscanf(p, "%d", &i);
        fclose(p);
    }
    return i;
}

#else
#error Unrecognized platform
#endif

#endif

I have been working on some cache line stuff and needed to write a cross-platform function. I committed it to a github repo at https://github.com/NickStrupat/CacheLineSize, or you can just use the source below. Feel free to do whatever you want with it.

#ifndef GET_CACHE_LINE_SIZE_H_INCLUDED
#define GET_CACHE_LINE_SIZE_H_INCLUDED

// Author: Nick Strupat
// Date: October 29, 2010
// Returns the cache line size (in bytes) of the processor, or 0 on failure

#include <stddef.h>
size_t cache_line_size();

#if defined(__APPLE__)

#include <sys/sysctl.h>
size_t cache_line_size() {
    size_t line_size = 0;
    size_t sizeof_line_size = sizeof(line_size);
    sysctlbyname("hw.cachelinesize", &line_size, &sizeof_line_size, 0, 0);
    return line_size;
}

#elif defined(_WIN32)

#include <stdlib.h>
#include <windows.h>
size_t cache_line_size() {
    size_t line_size = 0;
    DWORD buffer_size = 0;
    DWORD i = 0;
    SYSTEM_LOGICAL_PROCESSOR_INFORMATION * buffer = 0;

    GetLogicalProcessorInformation(0, &buffer_size);
    buffer = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION *)malloc(buffer_size);
    GetLogicalProcessorInformation(&buffer[0], &buffer_size);

    for (i = 0; i != buffer_size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
        if (buffer[i].Relationship == RelationCache && buffer[i].Cache.Level == 1) {
            line_size = buffer[i].Cache.LineSize;
            break;
        }
    }

    free(buffer);
    return line_size;
}

#elif defined(linux)

#include <stdio.h>
size_t cache_line_size() {
    FILE * p = 0;
    p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
    unsigned int i = 0;
    if (p) {
        fscanf(p, "%d", &i);
        fclose(p);
    }
    return i;
}

#else
#error Unrecognized platform
#endif

#endif
坦然微笑 2024-07-25 06:30:28

在 x86 上,您可以使用 CPUID 指令和函数 2 来确定缓存的各种属性和TLB。 解析函数 2 的输出有些复杂,因此我建议您参考 英特尔处理器标识和 CPUID 指令 (PDF)。

要从 C/C++ 代码获取此数据,您需要使用内联汇编、编译器内部函数或调用外部汇编函数来执行 CPUID 指令。

On x86, you can use the CPUID instruction with function 2 to determine various properties of the cache and the TLB. Parsing the output of function 2 is somewhat complicated, so I'll refer you to section 3.1.3 of the Intel Processor Identification and the CPUID Instruction (PDF).

To get this data from C/C++ code, you'll need to use inline assembly, compiler intrinsics, or call an external assembly function to perform the CPUID instruction.

蹲墙角沉默 2024-07-25 06:30:28

如果您使用 SDL2,则可以使用此函数:

int SDL_GetCPUCacheLineSize(void);

它返回 L1 缓存行大小的大小(以字节为单位)。

在我的 x86_64 机器上,运行以下代码片段:

printf("CacheLineSize = %d",SDL_GetCPUCacheLineSize());

生成 CacheLineSize = 64

我知道我有点晚了,但只是为未来的访问者添加信息。
SDL 文档目前说返回的数字以 KB 为单位,但实际上是以字节为单位。

If you're using SDL2 you can use this function:

int SDL_GetCPUCacheLineSize(void);

Which returns the size of the L1 cache line size, in bytes.

In my x86_64 machine, running this code snippet:

printf("CacheLineSize = %d",SDL_GetCPUCacheLineSize());

Produces CacheLineSize = 64

I know I'm a little late, but just adding information for future visitors.
The SDL documentation currently says the number returned is in KB, but it is actually in bytes.

囚你心 2024-07-25 06:30:28

自 C++17 起,您可以使用 std::hardware_delta_interference_size
其定义为:

两个对象之间的最小偏移量以避免错误共享。 有保证
至少是alignof(std::max_align_t)

You can use std::hardware_destructive_interference_size since C++17.
Its defined as:

Minimum offset between two objects to avoid false sharing. Guaranteed
to be at least alignof(std::max_align_t)

满意归宿 2024-07-25 06:30:28

在 Windows 平台上:

来自 https://devblogs.microsoft.com/oldnewthing /20091208-01/?p=15733

获取逻辑处理器信息
函数会给你特征
正在使用的逻辑处理器的数量
系统。 您可以步行
SYSTEM_LOGICAL_PROCESSOR_INFORMATION
由寻找的函数返回
RelationCache 类型的条目。 每个
此类条目包含 ProcessorMask
它告诉您哪个处理器
条目适用于,并且在
CACHE_DESCRIPTOR,它告诉你什么
正在描述缓存的类型
缓存行有多大
缓存。

On the Windows platform:

from https://devblogs.microsoft.com/oldnewthing/20091208-01/?p=15733

The GetLogicalProcessorInformation
function will give you characteristics
of the logical processors in use by
the system. You can walk the
SYSTEM_LOGICAL_PROCESSOR_INFORMATION
returned by the function looking for
entries of type RelationCache. Each
such entry contains a ProcessorMask
which tells you which processor(s) the
entry applies to, and in the
CACHE_DESCRIPTOR, it tells you what
type of cache is being described and
how big the cache line is for that
cache.

枕梦 2024-07-25 06:30:28

ARMv6 及更高版本具有 C0 或缓存类型寄存器。 但是,它仅在特权模式下可用。

例如,摘自Cortex™-A8 技术参考手册

缓存类型寄存器的目的是确定指令
和数据缓存最小行长度(以字节为单位),以启用一系列
地址将被无效。

缓存类型寄存器是:

  • 只读寄存器
  • 只能在特权模式下访问。

缓存类型寄存器的内容取决于具体的
执行。 图3-2 Cache的位排列
输入寄存器...


不要假设 ARM 处理器有缓存(显然,有些处理器可以配置为没有缓存)。 确定它的标准方法是通过C0。 来自 ARM ARM,第 B6-6 页:

从 ARMv6 开始,系统控制协处理器缓存类型寄存器是
定义 L1 缓存的强制方法,请参阅缓存类型寄存器
第 B6-14 页。 这也是早期版本的推荐方法
架构。 此外,额外级别的考虑因素
B6-12 页上的缓存描述了 2 级的架构指南
缓存支持。

ARMv6 and above has C0 or the Cache Type Register. However, its only available in privileged mode.

For example, from Cortex™-A8 Technical Reference Manual:

The purpose of the Cache Type Register is to determine the instruction
and data cache minimum line length in bytes to enable a range of
addresses to be invalidated.

The Cache Type Register is:

  • a read-only register
  • accessible in privileged modes only.

The contents of the Cache Type Register depend on the specific
implementation. Figure 3-2 shows the bit arrangement of the Cache
Type Register...


Don't assume the ARM processor has a cache (apparently, some can be configured without one). The standard way to determine it is via C0. From the ARM ARM, page B6-6:

From ARMv6, the System Control Coprocessor Cache Type register is the
mandated method to define the L1 caches, see Cache Type register on
page B6-14. It is also the recommended method for earlier variants of
the architecture. In addition, Considerations for additional levels of
cache on page B6-12 describes architecture guidelines for level 2
cache support.

夏末 2024-07-25 06:30:28

您还可以尝试通过测量一些时间以编程方式完成此操作。 显然,它并不总是像 cpuid 之类的那样精确,但它更便携。 ATLAS 在其配置阶段执行此操作,您可能需要查看它:

http://math-atlas.sourceforge .net/

You can also try to do it programmatically by measuring some timing. Obviously, it won't always be as precise as cpuid and the likes, but it is more portable. ATLAS does it at its configuration stage, you may want to look at it:

http://math-atlas.sourceforge.net/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文