优化向量中不必要的字符串复制;

发布于 2024-12-07 01:49:26 字数 741 浏览 0 评论 0原文

提供最少的代码来描述问题:

struct A {
  vector<string> v;
  // ... other data and methods
};
A obj;
ifstream file("some_file.txt");
char buffer[BIG_SIZE];
while( <big loop> ) {
  file.getline(buffer, BIG_SIZE-1);
  // process buffer; which may change its size
  obj.v.push_back(buffer);  // <------- can be optimized ??
}
...

这里发生2次字符串创建; 第一次创建实际的 string 对象,第二次为 vector 复制构建它。 演示

push_back() 操作发生数百万次< /strong> 而且我多次支付一笔额外的分配费用,这对我来说毫无用处。

有没有办法优化这个?我愿意接受任何合适的改变。 (不要将其归类为过早优化,因为 push_back() 在整个代码中发生了很多次)。

Presenting the minimal code to describe the problem:

struct A {
  vector<string> v;
  // ... other data and methods
};
A obj;
ifstream file("some_file.txt");
char buffer[BIG_SIZE];
while( <big loop> ) {
  file.getline(buffer, BIG_SIZE-1);
  // process buffer; which may change its size
  obj.v.push_back(buffer);  // <------- can be optimized ??
}
...

Here 2 times string creation happens; 1st time to create the actual string object and 2nd time while copy constructing it for the vector. Demo

The push_back() operation happens millions of times and I am paying for one extra allocation those many times which is of no use for me.

Is there a way to optimize this ? I am open for any suitable change. (not categorizing this as premature optimization because push_back() happens so many times throughout the code).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

半衬遮猫 2024-12-14 01:49:26

好吧,你得到了两个分配,但不是两个都是字符串的:其中一个创建字符串,而另一个仅在向量内部创建一个指针(请注意,这取决于编译器:某些编译器/设置可能确实创建两个字符串,但大多数不会)。查看演示的此代码

优化它的一种方法是使用 char* 而不是字符串作为模板参数(不要忘记在杀死向量之前手动删除它!)。这样你就可以摆脱一个(最大的)分配。或者,只需使用您自己的向量实现:您就可以控制内存分配的各个方面。

Well, you get two allocations, but not both of them are of the string: one of them creates the string, while the other creates just a pointer inside of the vector (note that this depends on the compiler: some compilers/settings might indeed create two strings, but most won't). Look at this code for the demo.

One way to optimize it would be using the char* instead of the string as the template parameter (don't forget to manually delete it before killing the vector!). This way you'll get rid of one (biggest) of the allocations. Alternatively, just use your own implementation of vector: you'll be able to control every aspect of memory allocation then.

眼眸印温柔 2024-12-14 01:49:26

你可以尝试一些事情。第一个显然是启用编译器优化。 如果您可以将其声明为向量,这可能会有所帮助。

否则您可以尝试以下操作:

obj.v.resize(obj.v.size()+1);
obj.v.back().swap(string(buffer));

You can try a couple of things. The first is obviously to enable optimization on the compiler. If you can declare it as a vector<const string> that may help.

Otherwise you might try something like:

obj.v.resize(obj.v.size()+1);
obj.v.back().swap(string(buffer));
当爱已成负担 2024-12-14 01:49:26

不要将缓冲区放在堆栈上,而是将其放在堆上。然后使用指针向量。只有一个

Instead of having buffer on the stack - put it onto the heap. Then use a vector of pointers. Only one

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文