宏的模板专业化

发布于 2025-02-01 02:59:26 字数 1381 浏览 3 评论 0原文

#define FBGEMM_SPECIALIZED_REQUANTIZE(T)                            \
  template <>                                                       \
  FBGEMM_API void Requantize<T>(                                    \
      const int32_t* src,                                           \
      T* dst,                                                       \
      const int64_t len,                                            \
      const RequantizationParams& params,                           \
      int thread_id,                                                \
      int num_threads) {                                            \
    int64_t i_begin, i_end;                                         \
    fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end); \
    for (int64_t i = i_begin; i < i_end; ++i) {                     \
      dst[i] = Requantize<T>(src[i], params);                       \
    }                                                               \
  }
FBGEMM_SPECIALIZED_REQUANTIZE(uint16_t)
FBGEMM_SPECIALIZED_REQUANTIZE(int32_t)
#undef FBGEMM_SPECIALIZED_REQUANTIZE

它似乎是在使用宏来专门化功能。

我想知道这样做与没有宏和擅长于C ++的所有事物之间有什么区别？

原文

I'm looking at the following function, reproduced below

#define FBGEMM_SPECIALIZED_REQUANTIZE(T)                            \
  template <>                                                       \
  FBGEMM_API void Requantize<T>(                                    \
      const int32_t* src,                                           \
      T* dst,                                                       \
      const int64_t len,                                            \
      const RequantizationParams& params,                           \
      int thread_id,                                                \
      int num_threads) {                                            \
    int64_t i_begin, i_end;                                         \
    fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end); \
    for (int64_t i = i_begin; i < i_end; ++i) {                     \
      dst[i] = Requantize<T>(src[i], params);                       \
    }                                                               \
  }
FBGEMM_SPECIALIZED_REQUANTIZE(uint16_t)
FBGEMM_SPECIALIZED_REQUANTIZE(int32_t)
#undef FBGEMM_SPECIALIZED_REQUANTIZE

It appears to be using a macro to specialize the functions.

I'm wondering what is the difference between doing that vs. no macros and just specializing everything like usual in C++?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倚栏听风 2025-02-08 02:59:26

如评论中所述，宏仅仅是关于更换文本（更准确地说：令牌）。宏无法做更多打字不能做的事情。而不是

FBGEMM_SPECIALIZED_REQUANTIZE(uint16_t)
FBGEMM_SPECIALIZED_REQUANTIZE(int32_t)

作者可以在没有任何宏观的情况下拼出这两个专业。但是，这将导致代码重复。另一方面，可以避免这种情况，如Fabian Via：

template <typename T>                                                    
void Helper(const int32_t* src,                                           
  T* dst,                                                       
  const int64_t len,                                            
  const RequantizationParams& params,                           
  int thread_id,                                                
  int num_threads) {
   // code here
}

然后

template <>                                                       
FBGEMM_API void Requantize<uint16_t>(const int32_t* src,                                           
  uint16_t* dst,                                                       
  const int64_t len,                                            
  const RequantizationParams& params,                           
  int thread_id,                                                
  int num_threads) { Helper<uint16_t>(src,dst,len,params,thread_id,num_threads); }

是int32_t的相同专业化。请注意，该功能的参数列表如何导致大量重复。通常避免使用宏，因为它们导致混淆，通常避免使用代码重复，因为它很难维护代码。它的权衡是要进行的。

一次专门用于两种不同类型的替代方案是使用Sfinae，但这需要修改可能是不可取的主模板。或概念，但它们仅自C ++ 20以来才可用。

无论如何...

我想知道这样做与没有宏和专业在C ++中的所有事物之间有什么区别？

打字量。

As mentioned in comments, macros are merely about text replacement (more precisely: tokens). Macros cannot do something that more typing cannot do as well. Instead of

FBGEMM_SPECIALIZED_REQUANTIZE(uint16_t)
FBGEMM_SPECIALIZED_REQUANTIZE(int32_t)

The author could have spelled out the two specializations without using any macro. However, this would lead to code duplication. This on the other hand can be avoided, as mentioned by fabian via:

template <typename T>                                                    
void Helper(const int32_t* src,                                           
  T* dst,                                                       
  const int64_t len,                                            
  const RequantizationParams& params,                           
  int thread_id,                                                
  int num_threads) {
   // code here
}

and then

template <>                                                       
FBGEMM_API void Requantize<uint16_t>(const int32_t* src,                                           
  uint16_t* dst,                                                       
  const int64_t len,                                            
  const RequantizationParams& params,                           
  int thread_id,                                                
  int num_threads) { Helper<uint16_t>(src,dst,len,params,thread_id,num_threads); }

And same specialization for int32_t. Note how already the argument list of the function leads to lots of repetition. Macros are usually avoided because they lead to obfuscation, code duplication is usually avoided because it leads to hard to maintain code. Its a trade off to be made.

The other alternative to specialize for two different types at once is to use sfinae, but that requires to modify the primary template which may not be desirable. Or concepts but they are only available since C++20.

Anyhow...

I'm wondering what is the difference between doing that vs. no macros and just specializing everything like usual in C++?

The amount of typing.

回复收藏 0 原文

惜醉颜 2025-02-08 02:59:26

首先，我们应该从：

template <typename T>
FBGEMM_API void Requantize(
    const std::int32_t* src,
    T* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1);

注意仅声明此模板。我找不到它的实现。请注意，它在标题文件中。

还有 uint8_t8_t8_t8_t。

现在，作者计划使其仅适用于两种类型：uint8_t，uint16_t和int32_t。

您可以看到这一点，因为宏fbgemm_specialized_requantize 定义然后在这两种用法之后，它立即未定义。

因此，质疑这在当前状态如何？

如果库的用户使用此模板与不支持的指针进行类型（不是一种类型之一）：uint8_t，uint16_t和int32_t），它将获取链接器错误：“对function tempalte .....”。
如果图书馆的用户使用此寺庙，将使用uint8_t的指针将使用一个实现，并且将使用其他实现来用于uint16_t 和int32_t。

这很好吗？ IMO不是要获得编译错误而是链接错误是最好的。请注意，在开发新代码期间，直到不使用此代码（例如测试书面）构建可以通过，然后当您添加测试或使用新功能时，您就会获得链接器错误。

可以做得更好吗？是的！

但这取决于要求。解决此问题的一种方法是：
在标题文件中，已超载的老式功能：

FBGEMM_API void Requantize(
    const std::int32_t* src,
    uint8_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1);

FBGEMM_API void Requantize(
    const std::int32_t* src,
    uint16_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1);

FBGEMM_API void Requantize(
    const std::int32_t* src,
    int32_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1);

然后在CPP文件中定义模板，然后在此功能中使用它：

// this one has own version
FBGEMM_API void Requantize(
    const std::int32_t* src,
    uint8_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1) {
  int64_t i_begin, i_end;
  fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end);
  if (params.target_qparams.precision == 8 && cpuinfo_initialize() &&
      fbgemmHasAvx2Support()) {
    RequantizeAvx2(&src[i_begin], &dst[i_begin], i_end - i_begin, params);
  } else {
    for (int64_t i = i_begin; i < i_end; ++i) {
      dst[i] = Requantize<uint8_t>(src[i], params);
    }
  }
}

namespace detail {
  template <typename T>
  FBGEMM_API void Requantize<T>(
      const int32_t* src,
      T* dst,
      const int64_t len,
      const RequantizationParams& params,
      int thread_id,
      int num_threads) {
    int64_t i_begin, i_end;
    fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end);
    for (int64_t i = i_begin; i < i_end; ++i) {
      dst[i] = Requantize<T>(src[i], params);
    }
  }
} // namespace detail

FBGEMM_API void Requantize(
    const std::int32_t* src,
    uint16_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1) {
    detail::Requantize(src, dst, len, params, thread_id, num_threads);
}

FBGEMM_API void Requantize(
    const std::int32_t* src,
    int32_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1) {
    detail::Requantize(src, dst, len, params, thread_id, num_threads);
}

现在不需要链接错误替换为编译错误，而不需要宏。

其他方法是将模板声明保留在标题文件中，并在CPP文件中定义它，为uint8_t提供专业化。
因此，标题文件不变，在CPP中：

  template <typename T>
  FBGEMM_API void Requantize<T>(
      const int32_t* src,
      T* dst,
      const int64_t len,
      const RequantizationParams& params,
      int thread_id,
      int num_threads) {
    int64_t i_begin, i_end;
    fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end);
    for (int64_t i = i_begin; i < i_end; ++i) {
      dst[i] = Requantize<T>(src[i], params);
    }
  }

template<>
FBGEMM_API void Requantize<uint8_t>(
    const std::int32_t* src,
    uint8_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1) {
  int64_t i_begin, i_end;
  fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end);
  if (params.target_qparams.precision == 8 && cpuinfo_initialize() &&
      fbgemmHasAvx2Support()) {
    RequantizeAvx2(&src[i_begin], &dst[i_begin], i_end - i_begin, params);
  } else {
    for (int64_t i = i_begin; i < i_end; ++i) {
      dst[i] = Requantize<uint8_t>(src[i], params);
    }
  }
}

tempalte Requantize<uint16_t>;
tempalte Requantize<int32_t>;

但是我不知道这是如何与其他过载的recrece进行交互的。

First we should start from original template:

template <typename T>
FBGEMM_API void Requantize(
    const std::int32_t* src,
    T* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1);

Note this template is only declared. I can't find implementation of it. Note it is in a header file.

There is also alternative specialization for uint8_t.

Now author planed to make it work only for two types: uint8_t, uint16_t and int32_t.

You can see this, since macro FBGEMM_SPECIALIZED_REQUANTIZE is defined then after those two usages it is immediately undefined.

So question how this behaves in current state?

if user of library uses this template with pointer to type which is not supported (is not one of types: uint8_t, uint16_t and int32_t), it will get linker error: "undefined reference to function tempalte .....".
if user of library uses this temple with a pointer of uint8_t one implementation will be used and other implementation will be used for uint16_t and int32_t.

Is this good? IMO not it is better to get compilation error instead linking error. Note that during development of new code, until this code is not in use (for example test written) build can pass, then when you add test or use new functionality then you got linker error.

Can this be done better? YES!

But it depends on requirements. One way to solve this is this way:
In header file have overloaded old fashioned functions:

FBGEMM_API void Requantize(
    const std::int32_t* src,
    uint8_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1);

FBGEMM_API void Requantize(
    const std::int32_t* src,
    uint16_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1);

FBGEMM_API void Requantize(
    const std::int32_t* src,
    int32_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1);

Then in cpp file define template and just use it inside of this functions:

// this one has own version
FBGEMM_API void Requantize(
    const std::int32_t* src,
    uint8_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1) {
  int64_t i_begin, i_end;
  fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end);
  if (params.target_qparams.precision == 8 && cpuinfo_initialize() &&
      fbgemmHasAvx2Support()) {
    RequantizeAvx2(&src[i_begin], &dst[i_begin], i_end - i_begin, params);
  } else {
    for (int64_t i = i_begin; i < i_end; ++i) {
      dst[i] = Requantize<uint8_t>(src[i], params);
    }
  }
}

namespace detail {
  template <typename T>
  FBGEMM_API void Requantize<T>(
      const int32_t* src,
      T* dst,
      const int64_t len,
      const RequantizationParams& params,
      int thread_id,
      int num_threads) {
    int64_t i_begin, i_end;
    fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end);
    for (int64_t i = i_begin; i < i_end; ++i) {
      dst[i] = Requantize<T>(src[i], params);
    }
  }
} // namespace detail

FBGEMM_API void Requantize(
    const std::int32_t* src,
    uint16_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1) {
    detail::Requantize(src, dst, len, params, thread_id, num_threads);
}

FBGEMM_API void Requantize(
    const std::int32_t* src,
    int32_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1) {
    detail::Requantize(src, dst, len, params, thread_id, num_threads);
}

Now linker errors are replaced with compilation errors and macros are not needed.

Other way to do it is keep template declaration in header file and define it in cpp file providing specialization for uint8_t.
So header file unchanged and in cpp:

  template <typename T>
  FBGEMM_API void Requantize<T>(
      const int32_t* src,
      T* dst,
      const int64_t len,
      const RequantizationParams& params,
      int thread_id,
      int num_threads) {
    int64_t i_begin, i_end;
    fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end);
    for (int64_t i = i_begin; i < i_end; ++i) {
      dst[i] = Requantize<T>(src[i], params);
    }
  }

template<>
FBGEMM_API void Requantize<uint8_t>(
    const std::int32_t* src,
    uint8_t* dst,
    std::int64_t len,
    const RequantizationParams& params,
    int thread_id = 0,
    int num_threads = 1) {
  int64_t i_begin, i_end;
  fbgemmPartition1D(thread_id, num_threads, len, i_begin, i_end);
  if (params.target_qparams.precision == 8 && cpuinfo_initialize() &&
      fbgemmHasAvx2Support()) {
    RequantizeAvx2(&src[i_begin], &dst[i_begin], i_end - i_begin, params);
  } else {
    for (int64_t i = i_begin; i < i_end; ++i) {
      dst[i] = Requantize<uint8_t>(src[i], params);
    }
  }
}

tempalte Requantize<uint16_t>;
tempalte Requantize<int32_t>;

but I do not know how this interacts with other overloads of Requantize.

回复收藏 0 原文

~没有更多了~