矢量<布尔>使用权

发布于 2024-11-17 04:46:55 字数 3132 浏览 5 评论 0原文

我使用 gprof 分析了我的代码,从报告中可以看出,大多数(如果不是全部)前 20 个左右的内容都与向量有关

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 14.71      0.05     0.05  3870399     0.00     0.00  std::vector<bool, std::allocator<bool> >::size() const
 11.76      0.09     0.04 10552897     0.00     0.00  std::_Bit_reference::_Bit_reference(unsigned long*, unsigned long)
 11.76      0.13     0.04  7890323     0.00     0.00  std::_Bit_const_iterator::_Bit_const_iterator(std::_Bit_iterator const&)
  5.88      0.15     0.02 10089215     0.00     0.00  std::_Bit_iterator::operator*() const
  5.88      0.17     0.02  6083600     0.00     0.00  std::vector<bool, std::allocator<bool> >::operator[](unsigned int)
  5.88      0.19     0.02  3912611     0.00     0.00  std::vector<bool, std::allocator<bool> >::end() const
  5.88      0.21     0.02                             std::istreambuf_iterator<char, std::char_traits<char> > std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::_M_extract_int<unsigned long long>(std::istreambuf_iterator<char, std::char_traits<char> >, std::istreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, std::_Ios_Iostate&, unsigned long long&) const
  2.94      0.22     0.01  6523499     0.00     0.00  std::_Bit_reference::operator bool() const
  2.94      0.23     0.01  3940406     0.00     0.00  std::vector<bool, std::allocator<bool> >::begin() const
  2.94      0.24     0.01  2807828     0.00     0.00  std::_Bit_iterator::operator++()
  2.94      0.25     0.01   146917     0.00     0.00  std::_Bit_iterator_base::_M_incr(int)
  2.94      0.26     0.01   121706     0.00     0.00  std::__miter_base<unsigned long*, false>::__b(unsigned long*)
  2.94      0.27     0.01    46008     0.00     0.00  std::_Bvector_base<std::allocator<bool> >::~_Bvector_base()
  2.94      0.28     0.01    22596     0.00     0.00  std::_Bit_iterator std::__copy_move<false, false, std::random_access_iterator_tag>::__copy_m<std::_Bit_iterator, std::_Bit_iterator>(std::_Bit_iterator, std::_Bit_iterator, std::_Bit_iterator)
  2.94      0.29     0.01     4525     0.00     0.05  integer::operator+(integer)
  2.94      0.30     0.01     1382     0.01     0.01  void std::_Destroy<unsigned int*, unsigned int>(unsigned int*, unsigned int*, std::allocator<unsigned int>&)
  2.94      0.31     0.01                             std::string::size() const
  2.94      0.32     0.01                             std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()
  2.94      0.33     0.01                             std::locale::locale()
  2.94      0.34     0.01                             __dynamic_cast

,这是一个好兆头,因为这意味着我的其余函数非常高效,或者从向量<布尔>真的很慢吗?

我用 gcc -std=c++0x 编译

I profiled my code using gprof and from the report, most, if not all of the top 20 or so things are about vector

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 14.71      0.05     0.05  3870399     0.00     0.00  std::vector<bool, std::allocator<bool> >::size() const
 11.76      0.09     0.04 10552897     0.00     0.00  std::_Bit_reference::_Bit_reference(unsigned long*, unsigned long)
 11.76      0.13     0.04  7890323     0.00     0.00  std::_Bit_const_iterator::_Bit_const_iterator(std::_Bit_iterator const&)
  5.88      0.15     0.02 10089215     0.00     0.00  std::_Bit_iterator::operator*() const
  5.88      0.17     0.02  6083600     0.00     0.00  std::vector<bool, std::allocator<bool> >::operator[](unsigned int)
  5.88      0.19     0.02  3912611     0.00     0.00  std::vector<bool, std::allocator<bool> >::end() const
  5.88      0.21     0.02                             std::istreambuf_iterator<char, std::char_traits<char> > std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::_M_extract_int<unsigned long long>(std::istreambuf_iterator<char, std::char_traits<char> >, std::istreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, std::_Ios_Iostate&, unsigned long long&) const
  2.94      0.22     0.01  6523499     0.00     0.00  std::_Bit_reference::operator bool() const
  2.94      0.23     0.01  3940406     0.00     0.00  std::vector<bool, std::allocator<bool> >::begin() const
  2.94      0.24     0.01  2807828     0.00     0.00  std::_Bit_iterator::operator++()
  2.94      0.25     0.01   146917     0.00     0.00  std::_Bit_iterator_base::_M_incr(int)
  2.94      0.26     0.01   121706     0.00     0.00  std::__miter_base<unsigned long*, false>::__b(unsigned long*)
  2.94      0.27     0.01    46008     0.00     0.00  std::_Bvector_base<std::allocator<bool> >::~_Bvector_base()
  2.94      0.28     0.01    22596     0.00     0.00  std::_Bit_iterator std::__copy_move<false, false, std::random_access_iterator_tag>::__copy_m<std::_Bit_iterator, std::_Bit_iterator>(std::_Bit_iterator, std::_Bit_iterator, std::_Bit_iterator)
  2.94      0.29     0.01     4525     0.00     0.05  integer::operator+(integer)
  2.94      0.30     0.01     1382     0.01     0.01  void std::_Destroy<unsigned int*, unsigned int>(unsigned int*, unsigned int*, std::allocator<unsigned int>&)
  2.94      0.31     0.01                             std::string::size() const
  2.94      0.32     0.01                             std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()
  2.94      0.33     0.01                             std::locale::locale()
  2.94      0.34     0.01                             __dynamic_cast

is that a good sign, since it means that the rest of my functions are pretty efficeint, or that accessing values from a vector< bool > is really slow?

im compiling with gcc -std=c++0x

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

宁愿没拥抱 2024-11-24 04:46:55

vector 不存储 bool。它基本上是一个位字段。您要为修改单个值所需的一点操作付出代价。

如果需要考虑运行时性能,请考虑使用 vectordeque

vector<bool> does not store bools. It's basically a bitfield. You're paying for the bit twiddling it takes to modify a single value.

If runtime performance is a concern, consider vector<char> or deque<bool> instead.

违心° 2024-11-24 04:46:55

因为这意味着我的其余函数非常高效,或者从向量访问值非常慢?

由于“慢”和“高效”是相对值,因此这本质上是毫无意义的区别。
解释该报告的最客观方法是:

由于 std::vector 的操作消耗了最大量的时间,因此这应该是使代码更快的起点

请注意,std::vector 通常比 std::vector 慢一点,因为它不存储真正的 bool > 而是一组位掩码(即理想情况下每个条目只需要一个位)。这节省了空间,但速度较慢。如果您需要更快,请尝试使用 std::vector (或 char, ..,具体取决于您的需要)。

我怀疑 std::vector可能会因调试构建而受到很大影响,因此如果您还没有这样做,请尝试一些优化标志(您总是应该进行分析)。

since it means that the rest of my functions are pretty efficeint, or that accessing values from a vector is really slow?

As 'slow' and 'efficient' are relative values, this is essentially a senseless distinction.
The most objective way to interpret the report is:

As std::vector' operation eat up the most significant amount of time, this should be the point to start with to make the code even faster.

Note that std::vector<bool> is generally a bit slower than a std::vector<int> because it does not store real bools but rather a set of bitmasks (i.e. ideally it needs only one bit per entry). This saves space, but is slower. If you need it to be faster, try using std::vector<int> (or char, .., depending on your needs) instead.

I would suspect that std::vector<bool> may suffer greatly from debug builds, so try some optimization flags if you didn't do that already (you always should for profiling).

以歌曲疗慰 2024-11-24 04:46:55

vector 实际上是一个模板专门化,其中每个 bool 值都存储为单个位。但是,不可能像使用 int 或仅使用“普通”bool 那样直接使用各个位。因此,向量中使用的算法与“正常”向量<>非常不同,并且为了保留向量尽可能多地接口,当您调用诸如operator[]之类的函数时,它可能会返回操作这些位的代理对象。这可能会影响 gprof 报告中的结果,具体取决于编译器的配置方式和相关代码。

vector<bool> is actually a template specialization where each bool value is stored as a single bit. However, it's not possible to directly work with individual bits the same way you could with int or with just "normal" bool. So the algorithms used in vector<bool> is very different from a "normal" vector<>, and in order to perserve the vector interface as much as possible it may return proxy objects that manipulates the bits when you call functions like operator[]. That may contribute to the results in your gprof report, depending on how your compiler is configured and the code in question.

凉风有信 2024-11-24 04:46:55

我想说这很糟糕,因为你 14.71% 的时间都花在了 vector::size() 上!?!大小可能是给定的。

如果您预先知道大小,请尝试减少对 size() 的调用次数或使用固定大小的向量: bitset

编辑 阅读问题更新后:

强制更改:g++ --std=c ++0x -g -O3 (这是两个拼写错误和优化标志,重新分析!);模板类大量使用内联,这反过来又实现了无数其他优化。加速比轻松提升10倍

I'd say it stinks because 14.71% of your time is spent doing vector<bool>::size() !?! Size is probably a given.

Try to reduce the number of calls to size() or use a fixed size vector if you know the size up front: bitset

Edit after reading the update to the question:

Mandatory change: g++ --std=c++0x -g -O3 (that's two typo's and the optimization flag, reprofile!); Template classes use inlining heavily and this in turn enables a gazillion other optimizations. The order of speedup is easily 10-fold

べ繥欢鉨o。 2024-11-24 04:46:55

它对你的计划说了很多吗?
除了 vector 业务之外,它基本上没有告诉您任何信息。

您将亲眼目睹gprof 的问题

假设您知道某个函数具有很高的“自时间”,这意味着程序计数器在其中被采样了很多次,但它不是您编写或可以修改的函数。

你唯一能做的就是尝试更少地调用它,或者尝试更少地调用调用它的例程,或者尝试更少地调用那个例程,
你只能猜测那是哪里。

gprof 还尝试通过猜测例程的包含时间、调用次数以及调用图来帮助您。
如果没有递归,并且只有十几个左右的函数,并且不执行任何 I/O,这可能会有所帮助。

有一种稍微不同的方法,体现在像 Zoom 这样的分析器中。
不要仅对程序计数器进行采样,而是对整个调用堆栈进行采样。
为什么?因为负责所花费时间的代码行在此期间在堆栈上,只是要求引起注意。

分析器会根据挂钟时间对调用堆栈进行采样,并告诉您大多数时间在堆栈上找到哪些代码行,这是最有效的。
更有效的是,如果您可以查看堆栈的各个样本,因为这还可以告诉您为什么调用这些行,而不仅仅是调用多少,因此很容易判断您是否真的不需要它们。

Does it say much about your program?
Other than the vector<bool> business, it's telling you basically nothing.

You're seeing first hand the problems with gprof.

Suppose you know some function has high "self time", meaning the program counter was sampled a good number of times in it, but it's not a function you wrote or can modify.

The only thing you can do about it is try to call it less, or try to call less the routine that calls it, or try to call that routine less,
and you're left trying to guess where that is.

gprof tries to help you by also guessing what a routine's inclusive time is, how many times it is called, and a call graph.
If there's no recursion, and you've only got a dozen or so functions, and you're not doing any I/O, this may be helpful.

There's a slightly different approach, embodied in profilers like Zoom.
Instead of sampling just the program counter, sample the whole call stack.
Why? Because the lines of code responsible for the time being spent are on the stack during that time, just asking to be noticed.

Profilers that sample the call stack, on wall clock time, and tell you which lines of code are found on the stack most of the time, are the most effective.
Even more effective is if you can look over individual samples of the stack, because that also tells you why those lines are being invoked, not just how much, so it's easy to tell if you don't really need them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文