C++ 中缓存对齐内存使用的类模板

发布于 2024-10-12 11:51:25 字数 3216 浏览 1 评论 0原文

(提供理解我的问题所需的信息很多,但是它已经被压缩)

我尝试实现一个类模板来分配和访问对齐的数据缓存。这非常有效,但是尝试实现对数组的支持是一个问题。

从语义上讲,代码应在内存中为单个元素提供这种映射,如下

cache_aligned<element_type>* my_el = 
          new(cache_line_size) cache_aligned<element_type>();
| element | buffer |

所示:访问(到目前为止)如下所示:

*my_el; // returns cache_aligned<element_type>
**my_el; //returns element_type
*my_el->member_of_element();

但是对于数组,我想要这样:

 cache_aligned<element_type>* my_el_array = 
         new(cache_line_size)  cache_aligned<element_type()[N];
 | element 0 | buffer | element 1 | buffer | ... | element (N-1) | buffer |

到目前为止,我有以下代码

template <typename T>
class cache_aligned {
    private:
        T instance;
    public:
        cache_aligned()
        {}
        cache_aligned(const T& other)
        :instance(other.instance)
        {}
        static void* operator new (size_t size, uint c_line_size) {
             return c_a_malloc(size, c_line_size);
        }
        static void* operator new[] (size_t size, uint c_line_size) {
             int num_el = (size - sizeof(cache_aligned<T>*) 
                              / sizeof(cache_aligned<T>);
             return c_a_array(sizeof(cache_aligned<T>), num_el, c_line_size);
        }
        static void operator delete (void* ptr) {
             free_c_a(ptr);
        }
        T* operator-> () {
             return &instance;
        }
        T& operator * () {
             return instance;
        }
};

:函数

void* c_a_array(uint size, ulong num_el, uint c_line_size) {
    void* mem = malloc((size + c_line_size) * num_el + sizeof(void*));
    void** ptr = (void**)((long)mem + sizeof(void*));
    ptr[-1] = mem;
    return ptr;
}

void free_c_a(void ptr) {
    free(((void**)ptr)[-1]);
}

cache_aligned_malloc问题就在这里,对数据的访问应该像这样工作:

my_el_array[i]; // returns cache_aligned<element_type>
*(my_el_array[i]); // returns element_type
my_el_array[i]->member_of_element();

我解决它的想法是:

(1)与此类似的东西,以重载 sizeof 运算符:

static size_t operator sizeof () {
   return sizeof(cache_aligned<T>) + c_line_size;
}

-->不可能,因为重载 sizeof 运算符是非法的

(2) 像这样,重载指针类型的运算符 []:

static T& operator [] (uint index, cache_aligned<T>* ptr) {
    return ptr + ((sizeof(cache_aligned<T>) + c_line_size) * index);
}

-->无论如何,在 C++ 中是不可能的

(3) 完全微不足道的解决方案

template <typename T> cache_aligned {
    private:
          T instance;
          bool buffer[CACHE_LINE_SIZE]; 
          // CACHE_LINE_SIZE defined as macro
    public:
          // trivial operators and methods ;)
};

-->我不知道这是否可靠,实际上我在linux中使用gcc-4.5.1...

(4)替换T实例;通过 T* instance_ptr;在类模板中并使用运算符 [] 来计算元素的位置,如下所示:

|指向实例的指针 | ----> |元素 0 |缓冲| ... |元素 (N-1) |缓冲|

这不是预期的语义,因为类模板的实例成为计算元素地址时的瓶颈。

感谢您的阅读!我不知道如何缩短这个问题。如果您能提供帮助,那就太好了!任何解决办法都会有很大帮助。

我知道对齐是 C++0x 中的扩展。然而,在 gcc 中它还不可用。

问候,塞玛

(to provide the information you need to understand my question is a lot, however it is already compressed)

i try to implement a class template to allocate and access data cache aligned. This works very good, however trying to implement support for arrays is a problem.

Semantically the code shall provide this mapping in memory for a single element like this:

cache_aligned<element_type>* my_el = 
          new(cache_line_size) cache_aligned<element_type>();
| element | buffer |

the access (so far) looks like this:

*my_el; // returns cache_aligned<element_type>
**my_el; //returns element_type
*my_el->member_of_element();

HOWEVER for an array, i'd like to have this:

 cache_aligned<element_type>* my_el_array = 
         new(cache_line_size)  cache_aligned<element_type()[N];
 | element 0 | buffer | element 1 | buffer | ... | element (N-1) | buffer |

So far i have the following code

template <typename T>
class cache_aligned {
    private:
        T instance;
    public:
        cache_aligned()
        {}
        cache_aligned(const T& other)
        :instance(other.instance)
        {}
        static void* operator new (size_t size, uint c_line_size) {
             return c_a_malloc(size, c_line_size);
        }
        static void* operator new[] (size_t size, uint c_line_size) {
             int num_el = (size - sizeof(cache_aligned<T>*) 
                              / sizeof(cache_aligned<T>);
             return c_a_array(sizeof(cache_aligned<T>), num_el, c_line_size);
        }
        static void operator delete (void* ptr) {
             free_c_a(ptr);
        }
        T* operator-> () {
             return &instance;
        }
        T& operator * () {
             return instance;
        }
};

the functions cache_aligned_malloc

void* c_a_array(uint size, ulong num_el, uint c_line_size) {
    void* mem = malloc((size + c_line_size) * num_el + sizeof(void*));
    void** ptr = (void**)((long)mem + sizeof(void*));
    ptr[-1] = mem;
    return ptr;
}

void free_c_a(void ptr) {
    free(((void**)ptr)[-1]);
}

The problem is here, the access to the data should work like this:

my_el_array[i]; // returns cache_aligned<element_type>
*(my_el_array[i]); // returns element_type
my_el_array[i]->member_of_element();

My ideas to solve it, are:

(1) something similar to this, to overload sizeof operator:

static size_t operator sizeof () {
   return sizeof(cache_aligned<T>) + c_line_size;
}

--> not possible since overloading sizeof operator is illegal

(2) something like this, to overload the operator [] for the pointer type:

static T& operator [] (uint index, cache_aligned<T>* ptr) {
    return ptr + ((sizeof(cache_aligned<T>) + c_line_size) * index);
}

--> not possible in C++, anyway

(3) totally trivial solution

template <typename T> cache_aligned {
    private:
          T instance;
          bool buffer[CACHE_LINE_SIZE]; 
          // CACHE_LINE_SIZE defined as macro
    public:
          // trivial operators and methods ;)
};

--> i don't know whether this is reliable, actually i'm using gcc-4.5.1 in linux ...

(4) Replacing T instance; by T* instance_ptr; in the class template and using the operator [] to calculate the position of the element, like this:

| pointer-to-instance | ----> | element 0 | buffer | ... | element (N-1) | buffer |

this is not the intended semantic, since the instance of the class template becomes the bottleneck when calculating the address of the elements.

Thanks for reading! I dont' know how to shorten the problem. It would be great, if you can help! Any work around would help a lot.

I know alignment is an extension in C++0x. However, in gcc it is not available yet.

Greetz, sema

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

清风疏影 2024-10-19 11:51:25

当 c_line_size 是编译时积分常量时,当然最好根据 sizeof T 用 char 数组填充 cache_aligned。

您还可以检查 2 个 T 是否适合一个缓存行,并相应地降低对齐要求。

不要指望这样的优化会产生奇迹。我认为对于某些算法来说,提高 2 倍的性能是您可以通过避免缓存行拆分来挤出的上限。

When c_line_size is compile time integral constant then of course better pad the cache_aligned with char array depending on sizeof T.

You can also check if 2 T-s fit onto one cache line and lower the alignment requirement accordingly.

Do not expect miracles from such an optimization. I think 2 times better performance for some algorithms is the ceiling that you can squeeze out from avoiding cache line splits.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文