C++ 中的引用速度

发布于 2024-07-20 12:52:20 字数 3814 浏览 5 评论 0原文

我一直在从事一个项目,并试图找到执行时间大幅减慢的根源,并将其缩小到我已设法优化逻辑的单一方法。 问题是我的解决方案涉及使用引用,这使得代码的另一部分运行速度相当慢...我想回答的问题是为什么当地图是引用而不是引用时,内部循环需要花费更长的时间来评估局部变量?

这是优化之前的旧方法:

// old method: create an empty map, populate it
// and then assign it back to the path object later
map<int,float> screenline_usage; 

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here. 
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[it->first] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~12 seconds
}

// This function call is evaluated 400k times for an overall execution time of ~126 seconds
path->set_zone_screenline_usage(access_mode, zone_id, screenline_usage); 

// TOTAL EXECUTION TIME: 138 seconds. 

优化之后的新方法:

// new method: get a reference to internal path mapping and populate it
map<int, float>& screenline_usage =
  path->get_zone_screenline_usage(access_mode, zone_id);
screenline_usage.clear();

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[it->first] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~76 seconds
}

// New method... no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: 76 seconds (62 second time saving) ... but should be able to do it in just 12 seconds if the use of reference didn't add so much time :(

以下是从该代码调用的相关子例程:

// This is the really slow routine, due to the copy assignment used. 
void set_zone_screenline_usage(int access_mode, int zone_id,
  map<int,float>& screenline_usage)
{
  m_container[access_mode][zone_id] = screenline_usage; 
}

map<int,float>& get_zone_screenline_usage(int access_mode, int zone_id)
{
  return m_container[access_mode][zone_id]; 
}

注意:计时信息针对单次运行,其中上述代码被评估大约 400k 次。 计时是使用我构建的一些类来访问 RDTSC 时间戳计数器(是的,我知道 TSC 意味着时间戳计数器),numCandidates 的平均值为 10,放入 screenline_usage 映射的元素的平均数量为 25。


更新: 首先感谢所有参与其中的人。 我认为最终这与 c++ 引用完全无关,而与缓存一致性有更多关系。 我已将上面的优化代码替换为向量& 和作为成员变量映射实现的哈希函数

// newest method: get a reference to internal path mapping (as vector) and populate it 
// map<int,int> m_linkNum_to_SlNum declared in header and populated in constructor. 
vector<float>& screenline_usage = path->get_zone_screenline_usage(access_mode, zone_id);

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[m_linkNum_to_SlNum[it->first]] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~9 seconds
}

// Newest method... again no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: just 9 seconds (129 second time saving) ... this is even better than using a locally constructed map which took 12 seconds in the inner loop :)

在我看来,鉴于向量不是本地的,而是一个连续的内存块,并且哈希函数(m_linkNum_to_SlNum)是本地成员变量,这种方法会导致代码/数据能够放入缓存中,而无需到主内存中获取数据,从而显着提高速度。 根据这些发现得出的其他结论受到高度赞赏。

I have been working on a project and trying to find the source of a large slowdown in execution time and have narrowed it down to a single method which I have managed to optimise out of the logic. The problem is that my solution involves using a reference which makes another section of the code run quite slowly... The question I'd like answered is why the inner loop takes so much longer to evaluate when the map is a reference as opposed to a local variable?

Here's the old way prior to optimisation:

// old method: create an empty map, populate it
// and then assign it back to the path object later
map<int,float> screenline_usage; 

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here. 
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[it->first] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~12 seconds
}

// This function call is evaluated 400k times for an overall execution time of ~126 seconds
path->set_zone_screenline_usage(access_mode, zone_id, screenline_usage); 

// TOTAL EXECUTION TIME: 138 seconds. 

New way after optimisation:

// new method: get a reference to internal path mapping and populate it
map<int, float>& screenline_usage =
  path->get_zone_screenline_usage(access_mode, zone_id);
screenline_usage.clear();

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[it->first] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~76 seconds
}

// New method... no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: 76 seconds (62 second time saving) ... but should be able to do it in just 12 seconds if the use of reference didn't add so much time :(

Here are the pertinent subroutines called from that code:

// This is the really slow routine, due to the copy assignment used. 
void set_zone_screenline_usage(int access_mode, int zone_id,
  map<int,float>& screenline_usage)
{
  m_container[access_mode][zone_id] = screenline_usage; 
}

map<int,float>& get_zone_screenline_usage(int access_mode, int zone_id)
{
  return m_container[access_mode][zone_id]; 
}

NOTES: Timing information is for a single run in which the above code is evaluated approximately 400k times. The timing is done using some classes that I built to access the RDTSC time stamp counter (yes i know TSC means time stamp counter), the average value of numCandidates is 10 and the average number of elements put into the screenline_usage map is 25.


UPDATE: Firstly thanks to everyone who has gotten involved here. I think that in the end this had nothing to do c++ references at all and had more to do with cache consistency. I have replaced the optimised code above with a vector& and a hash function implemented as a member variable map

// newest method: get a reference to internal path mapping (as vector) and populate it 
// map<int,int> m_linkNum_to_SlNum declared in header and populated in constructor. 
vector<float>& screenline_usage = path->get_zone_screenline_usage(access_mode, zone_id);

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[m_linkNum_to_SlNum[it->first]] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~9 seconds
}

// Newest method... again no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: just 9 seconds (129 second time saving) ... this is even better than using a locally constructed map which took 12 seconds in the inner loop :)

It seems to me here that, given that the vector isn't local but is a contiguous block of memory and that the hashing function (m_linkNum_to_SlNum) is a local member variable, this approach leads to code/data that is able to fit into cache without having to go out to main memory for data resulting in the significant speed up. Other conclusions given these findings are greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

反话 2024-07-27 12:52:20

也许您的 C++ 编译器能够为本地映射内联一些代码,但当映射是引用时则不能。

Maybe you C++ compiler is able to inline some code for the local map, but not when the map is a reference.

等待我真够勒 2024-07-27 12:52:20

你的问题不清楚。 您的意思是问,为什么通过引用传递映射比通过值传递更快? 如果是这样,答案很简单:按值返回地图意味着将其完整复制,而复制大型地图是一项非常昂贵的操作。

另一方面,如果您的问题是:为什么获取对现有地图的引用并填充该地图比制作新地图更快,那么一个假设是它与

 screenline_usage[it->first] += usage * it->second; 

since key [it-> ;first]已经存在于path->get_zone_screenline_usage内的地图中,那么这只是一个更新操作,不需要内存分配。 但是,如果 screenline_usage 是一个空映射,那么每次访问新的 [it->first] 都意味着它首先必须从堆中分配一个新的映射节点,这是昂贵的。

Your question is unclear. Do you mean to ask, why is passing a map by reference faster than passing by value? If so, the answer is simple: returning a map by value means copying it in its entirety, and copying a large map is a very expensive operation.

On the other hand, if your question is: why is faster to get a reference to an already existing map and populate that than it is to make a new map, then one hypothesis is that it has to do with

 screenline_usage[it->first] += usage * it->second; 

Since key [it->first] already exists in the map inside path->get_zone_screenline_usage, then this is is simply an update operation and requires no memory allocation. However, if screenline_usage is an empty map, then each access to a new [it->first] means that it first has to allocate a new map node from the heap, and that's expensive.

非要怀念 2024-07-27 12:52:20

如果我正确理解你的问题,你意味着插入到本地堆栈分配的映射<> 比插入现有的堆分配映射快得多<> 您通过引用检索的。

有多种可能性可能会影响性能。 不过,我怀疑它与 C++ 参考性能有什么关系。

第一种可能性是,通过将 screenline_usage 更改为引用,您“实质上”检索了指向现有对象的指针。 对象的实际实例可能不是 map,它可以是任何继承自 map 的对象。 例如,它可以是一个为其定义了自定义比较器函数的地图。 或者具有线程同步逻辑的子类。 由于您不知道从调用 m_container[access_mode][zone_id] 中获得的类型,因此您很可能会获得在插入时表现不佳的子类。 (顺便说一下,您可以在调试器中看到返回的类型。)然而,通过创建一个空的 map 您可以保证该类型是实际的 map<> ;,而不是子类。

假设您正在获得一张实际的地图<> 实例返回。

第二种可能性是,在您使用的特定 STL 风格中,map::clear() 函数无法有效地清除用于维护关联索引的内部数据结构。 我记得, stl::map<> 使用一些复杂的内部散列和桶结构来提供高效的插入和检索能力。 但是,尚不清楚当您调用clear() 时会发生什么。 因此,与从空地图开始相比,screenline_usage.clear() 可能会导致插入效率较低的地图。

最后,我们假设我错了,这两种可能性都不是。 有一种简单的方法可以确定差异是否是使用参考变量的结果。 您可以尝试在堆上分配一个新映射并将其分配给代码中的引用,如下所示:

map<int, float>& screenline_usage = new map<int,float>();

这将帮助您确定插入现有映射与新映射是否存在差异,或者是否确实存在差异事实上 screenline_usage 是影响性能的参考。 顺便说一句,不要忘记释放这个堆分配的映射,否则最终会出现内存泄漏。

If I understand your question correctly, you imply that inserting to a local stack-allocated map<> is much faster than inserting to an existing heap-allocated map<> that you retrieve by reference.

There are several possibilities that may be affecting performance. I doubt it has anything to do with C++ reference performance, though.

The first possibility, is that by changing screenline_usage to a reference, you are "essentially" retrieving a pointer to an existing object. The actual instance of the object may not be map<int,float>, it could by anything that inherits from map. For instance, it could be a map with a custom comparator function defined for it. Or a subclass with thread-syncronization logic. Since you don't know what type you are getting from the call to m_container[access_mode][zone_id], you may very well be getting a subclass that does not perform well on insert. (You can see what type you get back in the debugger, by the way.) Whereas, by creating an empty map<int,float> you have a guarantee that the type is an actual map<>, and not a subclass.

Let's assume that you are getting an actual map<> instance back.

The second possibility, is that in the particular flavor of STL that you are using, the map::clear() function does not efficiently clear out internal data structures used to maintain the associative indexes. As I recall, stl::map<> uses some complex internal hashing and bucket structures to provide efficient insert and retrieval capabilities. However, it's unclear what happens to those when you call clear(). So it's possible that screenline_usage.clear() results in a less efficient map to insert against than if you started with an empty map.

Finally, let's assume that I'm wrong, and it's neither of these possibilities. There is a simple way to determine whether the difference is the result of using a reference variable. You can try allocating a new map on the heap and assigning it to a reference in your code, like this:

map<int, float>& screenline_usage = new map<int,float>();

This will help you determine if there is some difference in inserting to an existing map vs. a new map, or if it is indeed the fact that screenline_usage is a reference that is affecting performance. BTW, Don't forget to free this heap-allocated map, or you'll end up with a memory leak.

予囚 2024-07-27 12:52:20

引用通常在幕后以指针的形式实现。 一个例外是,如果为它们分配了临时值,则该值的生命周期将延长到引用的生命周期; 本质上,引用就是对象。 是否

因此,这取决于get_combined_screenline_usage

返回按引用或按值。 如果是通过引用,引用方式可能会更快。 如果是按值计算,那么假设编译器进行了返回值优化,那么旧方法和新方法本质上是做同样的事情。

在任何一种情况下,编译器所做的其他优化(例如内联)可能会掩盖这两种选择的效果; 实际上,如果不知道你的慢速部分到底是哪条线,这会让这有点像猜谜游戏。

我建议尝试获取更精细的信息,并将其缩小到花费很长时间的行,这使得优化变得更加容易。

(注:

map<int, float>::iterator it = my_screenline_usage.begin(); 
for (; it != my_screenline_usage.end(); ++it) 

如果写成 可能会更有效

for (map<int, float>::const_iterator it(my_screenline_usage.begin()), end(my_screenline_usage.begin()); it != end; ++it)

References are most often behind the scenes implemented as pointers. An exception to this would be if a temporary is assigned to them , then the lifetime of the value is extended to the lifetime of the reference; in essence, the reference is the object. So, it depends if:

get_combined_screenline_usage

returns by-reference or by-value. If it is by reference, the reference way might be faster. If it is by value, then the old way and new way are doing essentially the same thing, assuming the compiler does a return value optimization.

In either case, other optimizations the compiler does (for instance inlining) might mask the effects of the two choices; effectively, without knowing which exact line your slow part is, it makes this a bit of a guessing game.

I'd suggest trying to get finer grain information, and narrow it down to exactly the line that is taking so long, it makes optimizing a lot easier.

(as a note:

map<int, float>::iterator it = my_screenline_usage.begin(); 
for (; it != my_screenline_usage.end(); ++it) 

might be more efficient if written as

for (map<int, float>::const_iterator it(my_screenline_usage.begin()), end(my_screenline_usage.begin()); it != end; ++it)

)

雨轻弹 2024-07-27 12:52:20

我试图弄清楚的问题是,当地图作为参考时,为什么内部循环需要更长的时间来评估?

我猜想花费很长时间的是对 get_zone_screenline_usage 的调用:而花费很长时间的原因是 m_container 中尚不存在特定元素因此必须先创建并插入,然后才能返回。

The problem I'm trying to figure out is why the inner loop takes so much longer to evaluate when the map is a reference?

I guess that what's taking a long time is the call to get_zone_screenline_usage: and that the reason why that's taking a long time is that the specific element doesn't already exist in m_container and must therefore be created and inserted before it can be returned.

讽刺将军 2024-07-27 12:52:20

根据我的更新,我认为这很可能是缓存一致性问题,而不是 C++ 参考问题。

As per my updates, I think this is most likely a cache-consistency problem rather than a c++ reference issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文