奇怪的性能问题

发布于 2024-08-09 06:06:51 字数 788 浏览 3 评论 0原文

我有一个和这个类似的容器。

template <typename Nat, typename Elt>
class NatMap {
 public:
  Elt& operator[] (Nat nat) { 
    return tab [nat.GetRaw()];
  }
 private:
  Elt tab [Nat::kBound];
};

我想放弃 Elt 有默认构造函数的要求:

template <typename Nat, typename Elt>
class NatMap {
 public:
  Elt& operator[] (Nat nat) { 
    return ((Elt*)tab) [nat.GetRaw()];
  }
 private:
  char tab [Nat::kBound * sizeof(Elt)];
};

我使用 g++-4.3,此代码在我的应用程序中比前一个代码运行慢 25%。不幸的是,这种放缓并没有体现在综合基准测试中。 我想这与编译器优化、别名、对齐或类似的东西有关。

我应该怎样做才能恢复我的表现? (虽然不需要默认构造函数)

更新:

刚才我尝试了新的 g++-4.4,它给了我后面代码的以下警告:

dereferencing pointer '<anonymous>' does break strict-aliasing rules

I have a container similar to this one.

template <typename Nat, typename Elt>
class NatMap {
 public:
  Elt& operator[] (Nat nat) { 
    return tab [nat.GetRaw()];
  }
 private:
  Elt tab [Nat::kBound];
};

I wanted to drop the requirement for Elt to have a default constructor:

template <typename Nat, typename Elt>
class NatMap {
 public:
  Elt& operator[] (Nat nat) { 
    return ((Elt*)tab) [nat.GetRaw()];
  }
 private:
  char tab [Nat::kBound * sizeof(Elt)];
};

I use g++-4.3 and this code works 25% slower in my application than the previous one. Unfortunately the slowdown does not manifest in a synthetic benchmark.
I guess it is something about compiler optimizations, aliasing, aligning, or similar stuff.

What should I do to get my performance back? (while not needing the default constructor)

Update:

Just now I tried new g++-4.4 and it gave me a following warning for the latter code:

dereferencing pointer '<anonymous>' does break strict-aliasing rules

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

眉目亦如画i 2024-08-16 06:06:51

您可能会遇到对齐问题。如果 Elt 的大小不同于本机对齐类型,则通过放置到字符数组中来分配它可能会涉及大量未对齐读取,而编译器为您对齐它时您不会看到这些未对齐读取。或者您可能会遇到称为“加载命中存储”的问题,某些处理器在将值写入内存然后立即读回时会出现此问题;在这些处理器中,它可以是与管道一样长的停顿。

或者它可能完全是别的东西,GCC 生成的某种病态代码。

不幸的是,堆栈跟踪无助于追踪这些问题,因为它们看起来像是一个加载操作(lwlb 等),而需要四十个周期之一。停顿发生在 CPU 内部的微代码中,而不是您编写的 x86 代码中。但是使用 -S 命令行选项查看程序集可以帮助您找出编译器真正发出的内容,以及它在两个实现之间有何不同。也许某个版本中出现了一些错误的操作。

You may be running into alignment problems. If Elt is some size other than the native alignment type, then allocating it via placement into a character array may involve a lot of unaligned reads that you don't see when the compiler aligns it for you. Or you may be running into a problem called a load-hit-store, which some processors manifest when they write a value to memory and then read it back immediately; in those processors, it can be a stall as long as a pipeline.

Or it may be something else entirely, some kind of pathological code generation by GCC.

Unfortunately stack traces don't help track down either of these issues, as they'd just look like a load operation (lw, lb, etc) that took forty cycles instead of one. The stall is in the microcode inside the CPU, not the x86 code you've written. But looking at the assembly with the -S commandline option can help you figure out what the compiler is really emitting, and how it differs between your two implementations. Maybe there's some bad operation cropping up in one version.

独孤求败 2024-08-16 06:06:51

小建议:不要试图做出有根据的猜测,比如编译器优化是否不同,您可以单步执行,或者找出 使用这种非正统的方法

Small suggestion: rather than trying to make educated guesses, like if the compiler optimizations are different, you could either single-step it, or find out with this unorthodox method.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文