帮助模板化字节交换函数，性能受到影响？

发布于 2024-10-15 13:57:58 字数 779 浏览 2 评论 0原文

template<int size>
inline void* byteswap(void* __x);

template<>
inline void* byteswap<2>(void* __x)
{
    return (*(uint16*)__x >> 8) | (*(uint16*)__x << 8);
}

template<>
inline void* byteswap<4>(void* __x)
{
    return (byteswap<4>(__x & 0xffff) << 16) | (bswap_16 (__x >> 16));
}

template<typename T>
inline T byteswap(T& swapIt)
{
    return (T*)byteswap<sizeof(T)>(swapIt);
}    

int main() {
    uint32 i32 = 0x01020304;
    uint16 i16 = 0x0102;

    byteswap(i32);
    byteswap(i16);

    return 0;
}

上面的内容显然无法编译。我很困惑，因为似乎我需要 void* 作为函数的参数，而 byteswap<4> 中的事情有点丑陋。当我需要调用 byteswap<2> 时但有参考。

知道如何让它看起来漂亮吗？它是否有可能实现（使用内联或其他技巧）使其性能与直接进行位操作一样？

原文

template<int size>
inline void* byteswap(void* __x);

template<>
inline void* byteswap<2>(void* __x)
{
    return (*(uint16*)__x >> 8) | (*(uint16*)__x << 8);
}

template<>
inline void* byteswap<4>(void* __x)
{
    return (byteswap<4>(__x & 0xffff) << 16) | (bswap_16 (__x >> 16));
}

template<typename T>
inline T byteswap(T& swapIt)
{
    return (T*)byteswap<sizeof(T)>(swapIt);
}    

int main() {
    uint32 i32 = 0x01020304;
    uint16 i16 = 0x0102;

    byteswap(i32);
    byteswap(i16);

    return 0;
}

The above obviously doesn't even compile. I'm confused as it seems that I need void* as parameter for the function and things kinda get ugly in byteswap<4> when I need to call byteswap<2> but with a reference.

Any idea how to make this look pretty? Is it possible for it to achieve (using inlining or other tricks) to make it as performance as doing the bit-operations directly?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沙沙粒小 2024-10-22 13:57:58

这就是我的编码方式：

#include <iostream>

typedef unsigned short uint16;
typedef unsigned int uint32;

template<typename T> T byteswap(T value);

template<>
uint16 byteswap<uint16>(uint16 value)
{
    return (value >> 8)|(value << 8);
}

template<>
uint32 byteswap<uint32>(uint32 value)
{
    return uint32(byteswap<uint16>(value) << 16) | byteswap<uint16>(value >> 16);
}

int main() {
    uint32 i32 = 0x11223344;
    uint16 i16 = 0x2142;

    std::cout << std::hex << byteswap(i32) << std::endl; // prints 44332211
    std::cout << std::hex << byteswap(i16) << std::endl; // prints 4221
}

换句话说，我不会像您那样使用 size 作为模板参数。

编辑
抱歉，我的第一个代码在 rt/uint32 交换方面是完全错误的。

This is how I'd code it:

#include <iostream>

typedef unsigned short uint16;
typedef unsigned int uint32;

template<typename T> T byteswap(T value);

template<>
uint16 byteswap<uint16>(uint16 value)
{
    return (value >> 8)|(value << 8);
}

template<>
uint32 byteswap<uint32>(uint32 value)
{
    return uint32(byteswap<uint16>(value) << 16) | byteswap<uint16>(value >> 16);
}

int main() {
    uint32 i32 = 0x11223344;
    uint16 i16 = 0x2142;

    std::cout << std::hex << byteswap(i32) << std::endl; // prints 44332211
    std::cout << std::hex << byteswap(i16) << std::endl; // prints 4221
}

in other words, I wouldn't use size as template parameter as you were doing.

EDIT
sorry, my first code was plain wrong wrt/ uint32 swapping.

回复收藏 0 原文

孤芳又自赏 2024-10-22 13:57:58

借用一些代码：

template<int N>
void byteswap_array(char (&bytes)[N]) {
  // Optimize this with a platform-specific API as desired.
  for (char *p = bytes, *end = bytes + N - 1; p < end; ++p, --end) {
    char tmp = *p;
    *p = *end;
    *end = tmp;
  }
}

template<typename T>
T byteswap(T value) {
  byteswap_array(*reinterpret_cast<char (*)[sizeof(value)]>(&value));
  return value;
}

Borrowing from some code:

template<int N>
void byteswap_array(char (&bytes)[N]) {
  // Optimize this with a platform-specific API as desired.
  for (char *p = bytes, *end = bytes + N - 1; p < end; ++p, --end) {
    char tmp = *p;
    *p = *end;
    *end = tmp;
  }
}

template<typename T>
T byteswap(T value) {
  byteswap_array(*reinterpret_cast<char (*)[sizeof(value)]>(&value));
  return value;
}

回复收藏 0 原文

音盲 2024-10-22 13:57:58

我会这样重写：

template < size_t size >
inline void sized_byteswap(char* data);

template <>
inline void sized_byteswap< 2 >(char* data)
{
    uint16_t* ptr = reinterpret_cast<uint16_t*>(data);
    *ptr = (*ptr >> 8)|(*ptr << 8);
}

template <>
inline void sized_byteswap< 4 >(char* data)
{
    uint32_t* ptr = reinterpret_cast<uint32_t*>(data);
    *ptr = (*ptr >> 24)|((*ptr & 0x00ff0000) >> 8)|((*ptr & 0x0000ff00) << 8)|(*ptr << 24);
}

template < typename T >
T byteswap(T value)
{
    sized_byteswap< sizeof(T) >(reinterpret_cast<char*>(&value));
    return value;
}

int main()
{
    uint32 i32 = byteswap(uint32(0x01020304));
    uint16 i16 = byteswap(uint16(0x0102));

    return 0;
}

I'll rewrite it like that:

template < size_t size >
inline void sized_byteswap(char* data);

template <>
inline void sized_byteswap< 2 >(char* data)
{
    uint16_t* ptr = reinterpret_cast<uint16_t*>(data);
    *ptr = (*ptr >> 8)|(*ptr << 8);
}

template <>
inline void sized_byteswap< 4 >(char* data)
{
    uint32_t* ptr = reinterpret_cast<uint32_t*>(data);
    *ptr = (*ptr >> 24)|((*ptr & 0x00ff0000) >> 8)|((*ptr & 0x0000ff00) << 8)|(*ptr << 24);
}

template < typename T >
T byteswap(T value)
{
    sized_byteswap< sizeof(T) >(reinterpret_cast<char*>(&value));
    return value;
}

int main()
{
    uint32 i32 = byteswap(uint32(0x01020304));
    uint16 i16 = byteswap(uint16(0x0102));

    return 0;
}

回复收藏 0 原文