从向量切换到双端队列的大小限制通常是多少?

发布于 2024-09-19 11:28:26 字数 441 浏览 2 评论 0原文

我最近写了这篇文章:
如何最好地存储C++ 中非常大的二维浮点数列表?错误处理?

有些人建议我将浮点的二维列表式结构实现为向量,其他人则说是双端队列。

据我所知,向量需要连续的内存,但因此效率更高。显然,如果可能的话,这将是可取的。

因此,我的问题是,关于基本结构可以有多长的好规则是什么......

1. 浮动
2. int

...在你应该从向量切换到双端队列以避免内存问题之前?

例如,我正在寻找诸如“在大约 400 万个浮点数或 800 万个整数时,您应该切换...”之类的答案...如果可能的话。

I recent wrote this post:
How best to store VERY large 2D list of floats in c++? Error-handling?

Some suggested that I implemented my 2D list-like structure of floats as a vector, others said a deque.

From what I gather vector requires continuous memory, but is hence more efficient. Obviously this would be desirable if possible.

Thus my question is what's a good rule of how long a basic structure can be in terms of...

1. float
2. int

...before you should switch from a vector to a deque to avoid memory problems?

e.g. I'm looking for answer like "At around 4 million floats or 8 million ints, you should switch..." ...if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

情感失落者 2024-09-26 11:28:27

需要考虑的因素太多,不可能给出明确的答案。机器上的内存量、碎片程度、可能变得碎片程度等。我的建议是只选择一个并使用它。如果它导致问题切换。无论如何,您很可能不会遇到这些边缘情况。

如果您真的担心,那么您可能可以实现一种伪 PIMPL:

template<typename T>
class VectorDeque
{
private:
  enum TYPE { NONE, DEQUE, VECTOR };
  std::deque<T> m_d;
  std::vector<T> m_v;
  TYPE m_type;
  ...
public:
  void resize(size_t n)
  {
    switch(m_type)
    {
      case NONE:
      try
      {
        m_v.resize(n);
        m_type = VECTOR;
      }
      catch(std::bad_alloc &ba)
      {
        m_d.resize(n);
        m_type = DEQUE;
      }
      break;
    }
  }
};

但这似乎完全是矫枉过正。您是否做过任何基准测试或测试?您是否有任何理由相信内存分配会失败或双端队列会太慢?

There are so many factors to consider that it's impossible to give a clear answer. The amount of memory on the machine, how fragmented it is, how fragmented it may become, etc. My suggestion is to just choose one and use it. If it causes problems switch. Chances are you aren't going to hit those edge cases anyway.

If you are truly worried, then you could probably implement a sort of pseudo PIMPL:

template<typename T>
class VectorDeque
{
private:
  enum TYPE { NONE, DEQUE, VECTOR };
  std::deque<T> m_d;
  std::vector<T> m_v;
  TYPE m_type;
  ...
public:
  void resize(size_t n)
  {
    switch(m_type)
    {
      case NONE:
      try
      {
        m_v.resize(n);
        m_type = VECTOR;
      }
      catch(std::bad_alloc &ba)
      {
        m_d.resize(n);
        m_type = DEQUE;
      }
      break;
    }
  }
};

But this seems like total overkill. Have you done any benchmarking or tests? Do you have any reason to believe that memory allocations will fail or that deques will be too slow?

朕就是辣么酷 2024-09-26 11:28:27

您在测试和分析表明其中一个比另一个更适合您的应用程序后进行切换。不存在“大约 N 个浮点数或 M 个整数”的通用答案。

You switch after testing and profiling indicate that one is worse than the other for your application. There is no "around N floats or M ints" universal answer.

烟花易冷人易散 2024-09-26 11:28:27

好吧,关于内存,我可以分享一些经验,可以帮助您决定连续内存块(malloc 或 std::vector)何时可能变得太大:

我使用的应用程序确实记录测量数据,大部分是 4 字节浮点型,为此它分配内部缓冲区来存储数据。这些缓冲区的大小差异很大,但典型范围可以是几十个1-10MB,少数>100MB。缓冲区总是用calloc 分配,即一大块内存。如果缓冲区分配失败,则会记录错误,并且用户可以选择重试。

缓冲区大小:假设您想以 100Hz 录制 1000 个通道 10 分钟:4byte x 1000 x 100 x 60x10 == 228 MB(大约)...或以 10Hz 录制 100 个通道 12 小时 == 41 MB

我们(几乎)分配 40MB 缓冲区(大约有 1000 万个浮点数)从来没有遇到过任何问题,而 200-300MB 缓冲区有时会失败——所有这些都发生在具有 4GB RAM 的普通 WinXP/32 位机器上。

Well, regarding memory, I can share some experience that may help you decide when contiguous memory blocks (malloc or std::vector) may become too large:

The application I work with does record measurement data, mostly 4byte float, and for this it allocates internal buffers to store the data. These buffers heavily vary in size, but the typical range may be say, several dozen of 1-10MB and a very few of >100MB. The buffers are allways allocated with calloc, i.e. one big chunk of memory. If a buffer-allocation fails, an error is logged and the user has the choice to try again.

Buffer sizes: Say you want to record 1000 channels at 100Hz for 10 Minutes: 4byte x 1000 x 100 x 60x10 == 228 MB (approx.) ... or 100 channels at 10Hz for 12 hours == 41 MB

We (nearly) never had any problems allocating 40MB buffers (and that's about 10 millon floats) and the 200-300 MB buffers fail from time to time -- all this on normal WinXP/32bit boxes with 4GB RAM.

野味少女 2024-09-26 11:28:27

鉴于您在创建后不插入,您可能应该使用普通的旧 std::vector,或者如果碎片确实成为问题,则使用类似自定义矢量的 序列 实现为向量或指向固定大小数组的指针数组。

Given that you don't insert after creation, you should probably either use plain old std::vector, or if fragmentation really does become an issue, a custom vector-like Sequence implemented as a vector or array of pointers to fixed-size arrays.

垂暮老矣 2024-09-26 11:28:26

嗯,这里有两种意见。 C++ 标准说 (23.1.1/2):

向量是默认情况下应使用的序列类型。

list适用于序列中间频繁插入和删除的情况。

deque 是大多数插入和删除发生在序列开头或结尾时选择的数据结构。

Herb Sutter 提出以下观点(本文包含他的基本原理和性能分析):

我想提出一个友好的反对观点:我建议您考虑默认使用双端队列而不是向量,特别是当包含的类型是类或结构而不是内置类型时,除非您确实需要容器的内存是连续的。

Well, here are two opinions. The C++ standard says (23.1.1/2):

vector is the type of sequence that should be used by default.

list should be used when there are frequent insertions and deletions from the middle of the sequence.

deque is the data structure of choice when most insertions and deletions take place at the beginning or at the end of the sequence.

Herb Sutter argues the following (the article contains his rationale and a performance analysis):

I'd like to present an amiably dissenting point of view: I recommend that you consider preferring deque by default instead of vector, especially when the contained type is a class or struct and not a builtin type, unless you really need the container's memory to be contiguous.

勿忘心安 2024-09-26 11:28:26

同样,没有大小限制,超过该限制双端队列是否优于向量。在这两种情况下,内存碎片的影响几乎是相同的,除非您已经完成了大量的分配/解除分配,并且没有足够的连续空间留给大向量。但这种情况非常罕见。请记住,内存空间是每个进程(谷歌搜索虚拟内存)。您可以通过在混乱发生之前为向量分配内存(通过 reserve 方法)来解决这个问题。

权衡取决于你想用它做什么。如果结构基本上是不可变的,并且您只想通过索引访问来访问它/覆盖它,请选择向量。

双端队列是当你需要在末尾、开头或中间进行插入时,向量无法自然处理的事情(除了在末尾插入)。

Herb Sutter 的文章总体来说质量很高,但是您会注意到,当您在 C++ 中进行“数字运算”时,您在“通用 C++”书中学到的大部分内容都必须格外小心。您在双端队列中遇到的较差的索引性能对于您的应用程序来说可能很重要。在这种情况下,不要使用双端队列。

Again, there is no size limit above which deque is or not better than vector. Memory fragmentation implications are pretty much the same in either case, except when you have already done a huge load of allocations/deallocations and there is not enough contiguous space left for a big vector. But this case is very rare. Remember that memory space is per process (google for virtual memory). And you can remedy it by allocating the memory for the vector (by the reserve method) before the cluttering takes place.

The tradeoff is in term of what you want to do with it. If the structure is basically immutable and you only want to access it / overwrite it by index access, go for vector.

Deque is when you need to do insertions either at the end, the beginning or in the middle, something vector cannot handle naturally (except for inserting at the end).

Herb Sutter's articles are in general of great quality, but you'll notice that when you do "number crunching" in C++, most of the stuff you're taught in "general C++" books must be taken with an extra word of caution. The poor indexing performance you experience with deques is perhaps important for your application. In this case, don't use deque.

叫嚣ゝ 2024-09-26 11:28:26

如果您需要在开头插入,请使用双端队列。

否则,我总是喜欢指出这篇关于 向量与双端队列 的文章(除了 James McNellis 链接的内容之外)。 假设双端队列的实现使用基于页的分配,本文对向量的分配时间(和释放时间)与 & 进行了很好的比较。没有 Reserve() 与 deque。基本上,使用 Reserve() 使向量分配时间与双端队列非常相似。如果您能提前猜出要预订的正确价格,那么信息丰富且有用。

If you need insertions at the beginning, then go with deque.

Otherwise, I always like to point to this article on vector vs. deque (in addition to those linked by James McNellis here). Assuming an implementation of deque that uses page-based allocation, this article has good comparisons of allocation time (& deallocation time) for vector with & without reserve() vs. deque. Basically, using reserve() makes vector allocation time very similar to deque. Informative, and useful if you can guess the right value to reserve ahead of time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文