C++ 中的动态缓冲区类型?

发布于 2024-08-13 20:21:08 字数 545 浏览 1 评论 0原文

我不完全是 C++ 新手,但过去我很少认真接触过它,所以我对其功能的了解相当粗略。

我正在用 C++ 编写一个快速概念验证程序,并且需要一个动态大小的二进制数据缓冲区。也就是说,我将从网络套接字接收数据,但我不知道会有多少数据(尽管不超过几MB)。我可以自己编写这样的缓冲区,但如果标准库可能已经有一些东西,为什么还要麻烦呢?我使用的是 VS2008,所以一些 Microsoft 特定的扩展对我来说就很好。我只需要四个操作:

  • 创建缓冲区
  • 将数据写入缓冲区(二进制垃圾,不是以零结尾)
  • 以字符数组形式获取写入的数据(及其长度)
  • 释放缓冲区

类/函数集的名称是什么/无论我需要什么?

添加:有几票投给了std::vector。一切都很好,但我不想逐字节推送几 MB 的数据。套接字将以几 KB 的大块形式向我提供数据,因此我想一次将它们全部写入。另外,最后我需要以简单的 char* 形式获取数据,因为我需要将整个 blob 传递给一些未修改的 Win32 API 函数。

I'm not exactly a C++ newbie, but I have had little serious dealings with it in the past, so my knowledge of its facilities is rather sketchy.

I'm writing a quick proof-of-concept program in C++ and I need a dynamically sizeable buffer of binary data. That is, I'm going to receive data from a network socket and I don't know how much there will be (although not more than a few MB). I could write such a buffer myself, but why bother if the standard library probably has something already? I'm using VS2008, so some Microsoft-specific extension is just fine by me. I only need four operations:

  • Create the buffer
  • Write data to the buffer (binary junk, not zero-terminated)
  • Get the written data as a char array (together with its length)
  • Free the buffer

What is the name of the class/function set/whatever that I need?

Added: Several votes go to std::vector. All nice and fine, but I don't want to push several MB of data byte-by-byte. The socket will give data to me in few-KB large chunks, so I'd like to write them all at once. Also, at the end I will need to get the data as a simple char*, because I will need to pass the whole blob along to some Win32 API functions unmodified.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

娇妻 2024-08-20 20:21:09

您想要一个 std::vector

std::vector<char> myData;

vector 会自动为您分配和释放其内存。使用 push_back 添加新数据(vector 将根据需要调整大小),并使用索引运算符 [] 检索数据。

如果您在任何时候都可以猜测需要多少内存,我建议调用 reserve ,这样后续的 push_back 就不必重新分配那么多内存。

如果您想读取一块内存并将其附加到缓冲区中,最简单的可能是这样的:

std::vector<char> myData;
for (;;) {
    const int BufferSize = 1024;
    char rawBuffer[BufferSize];

    const unsigned bytesRead = get_network_data(rawBuffer, sizeof(rawBuffer));
    if (bytesRead <= 0) {
        break;
    }

    myData.insert(myData.end(), rawBuffer, rawBuffer + bytesRead);
}

myData 现在拥有所有读取的数据,逐块读取。但是,我们复制了两次。

相反,我们尝试这样的操作:

std::vector<char> myData;
for (;;) {
    const int BufferSize = 1024;

    const size_t oldSize = myData.size();
    myData.resize(myData.size() + BufferSize);        

    const unsigned bytesRead = get_network_data(&myData[oldSize], BufferSize);
    myData.resize(oldSize + bytesRead);

    if (bytesRead == 0) {
        break;
    }
}

直接读取缓冲区,但代价是偶尔会过度分配。

这可以通过例如将每次调整大小的向量大小加倍来分摊调整大小来变得更智能,就像第一个解决方案隐式所做的那样。当然,如果您事先了解最终缓冲区的可能大小,则可以预先reserve()一个更大的缓冲区,以最大限度地减少大小调整。

两者都留给读者作为练习。 :)

最后,如果您需要将数据视为原始数组:

some_c_function(myData.data(), myData.size());

std::vector 保证是连续的。

You want a std::vector:

std::vector<char> myData;

vector will automatically allocate and deallocate its memory for you. Use push_back to add new data (vector will resize for you if required), and the indexing operator [] to retrieve data.

If at any point you can guess how much memory you'll need, I suggest calling reserve so that subsequent push_back's won't have to reallocate as much.

If you want to read in a chunk of memory and append it to your buffer, easiest would probably be something like:

std::vector<char> myData;
for (;;) {
    const int BufferSize = 1024;
    char rawBuffer[BufferSize];

    const unsigned bytesRead = get_network_data(rawBuffer, sizeof(rawBuffer));
    if (bytesRead <= 0) {
        break;
    }

    myData.insert(myData.end(), rawBuffer, rawBuffer + bytesRead);
}

myData now has all the read data, reading chunk by chunk. However, we're copying twice.

We instead try something like this:

std::vector<char> myData;
for (;;) {
    const int BufferSize = 1024;

    const size_t oldSize = myData.size();
    myData.resize(myData.size() + BufferSize);        

    const unsigned bytesRead = get_network_data(&myData[oldSize], BufferSize);
    myData.resize(oldSize + bytesRead);

    if (bytesRead == 0) {
        break;
    }
}

Which reads directly into the buffer, at the cost of occasionally over-allocating.

This can be made smarter by e.g. doubling the vector size for each resize to amortize resizes, as the first solution does implicitly. And of course, you can reserve() a much larger buffer up front if you have a priori knowledge of the probable size of the final buffer, to minimize resizes.

Both are left as an exercise for the reader. :)

Finally, if you need to treat your data as a raw-array:

some_c_function(myData.data(), myData.size());

std::vector is guaranteed to be contiguous.

清眉祭 2024-08-20 20:21:09

std::string 可以用于此:

  • 它支持嵌入空值。
  • 您可以通过使用指针和长度调用 append() 来向其追加多字节数据块。
  • 您可以通过调用 data() 获取其内容(char 数组),并通过调用 size()length() 获取当前长度> 关于它。
  • 释放缓冲区由析构函数自动处理,但您也可以对其调用 clear() 以擦除其内容而不破坏它。

std::string would work for this:

  • It supports embedded nulls.
  • You can append multi-byte chunks of data to it by calling append() on it with a pointer and a length.
  • You can get its contents as a char array by calling data() on it, and the current length by calling size() or length() on it.
  • Freeing the buffer is handled automatically by the destructor, but you can also call clear() on it to erase its contents without destroying it.
帥小哥 2024-08-20 20:21:09
std::vector<unsigned char> buffer;

每个push_back都会在末尾添加新的字符(如果需要则重新分配)。如果您大致知道所需的数据量,则可以调用reserve来最小化分配数量。

buffer.reserve(1000000);

如果你有这样的事情:

unsigned char buffer[1000];
std::vector<unsigned char> vec(buffer, buffer + 1000);
std::vector<unsigned char> buffer;

Every push_back will add new char at the end (reallocating if needed). You can call reserve to minimize the number of allocations if you roughly know how much data you expect.

buffer.reserve(1000000);

If you have something like this:

unsigned char buffer[1000];
std::vector<unsigned char> vec(buffer, buffer + 1000);
寄意 2024-08-20 20:21:09

又投一票给 std::vector。最少的代码,跳过 GMan 代码的额外副本:

std::vector<char> buffer;
static const size_t MaxBytesPerRecv = 1024;
size_t bytesRead;
do
{
    const size_t oldSize = buffer.size();

    buffer.resize(oldSize + MaxBytesPerRecv);
    bytesRead = receive(&buffer[oldSize], MaxBytesPerRecv); // pseudo, as is the case with winsock recv() functions, they get a buffer and maximum bytes to write to the buffer

    myData.resize(oldSize + bytesRead); // shrink the vector, this is practically no-op - it only modifies the internal size, no data is moved/freed
} while (bytesRead > 0);

至于调用 WinAPI 函数 - 使用 &buffer[0] (是的,这有点笨拙,但就是这样)传递给 char* 参数 buffer. size() 作为长度。

最后一点,您可以使用 std::string 而不是 std::vector,应该没有任何区别(除非您可以编写 buffer.data() 而不是 &buffer[0] 如果缓冲区是字符串)

One more vote for std::vector. Minimal code, skips the extra copy GMan's code do:

std::vector<char> buffer;
static const size_t MaxBytesPerRecv = 1024;
size_t bytesRead;
do
{
    const size_t oldSize = buffer.size();

    buffer.resize(oldSize + MaxBytesPerRecv);
    bytesRead = receive(&buffer[oldSize], MaxBytesPerRecv); // pseudo, as is the case with winsock recv() functions, they get a buffer and maximum bytes to write to the buffer

    myData.resize(oldSize + bytesRead); // shrink the vector, this is practically no-op - it only modifies the internal size, no data is moved/freed
} while (bytesRead > 0);

As for calling WinAPI functions - use &buffer[0] (yeah, it's a little bit clumsy, but that's the way it is) to pass to the char* arguments, buffer.size() as length.

And a final note, you can use std::string instead of std::vector, there shouldn't be any difference (except you can write buffer.data() instead of &buffer[0] if you buffer is a string)

盛夏尉蓝 2024-08-20 20:21:09

I'd take a look at Boost basic_streambuf, which is designed for this kind of purpose. If you can't (or don't want to) use Boost, I'd consider std::basic_streambuf, which is quite similar, but a little more work to use. Either way, you basically derive from that base class and overload underflow() to read data from the socket into the buffer. You'll normally attach an std::istream to the buffer, so other code reads from it about the same way as they would user input from the keyboard (or whatever).

流星番茄 2024-08-20 20:21:09

不是来自STL但可能有用的替代方案 - Boost.Circular buffer

An alternative which is not from STL but might be of use - Boost.Circular buffer

惜醉颜 2024-08-20 20:21:09

使用std::vector,一个不断增长的数组,保证存储是连续的(你的第三点)。

Use std::vector, a growing array that guarantees the storage is contiguous (your third point).

秋风の叶未落 2024-08-20 20:21:09

关于您的评论“我没有看到append()”,在最后插入是同样的事情。

vec.插入(vec.end,

Regarding your comment "I don't see an append()", ineserting at the end is the same thing.

vec.insert(vec.end,

撕心裂肺的伤痛 2024-08-20 20:21:09

如果您确实使用 std::vector,您只是用它来管理原始内存。
您可以malloc您认为需要的最大缓冲区,并跟踪到目前为止的写入偏移量/读取的总字节数(它们是同一件事)。
如果你到了最后......要么realloc要么选择失败的方式。

我知道,它不是很 C++,但这是一个简单的问题,其他建议似乎是引入不必要的副本的重量级方法。

If you do use std::vector, you're just using it to manage the raw memory for you.
You could just malloc the biggest buffer you think you'll need, and keep track of the write offset/total bytes read so far (they're the same thing).
If you get to the end ... either realloc or choose a way to fail.

I know, it isn't very C++y, but this is a simple problem and the other proposals seem like heavyweight ways to introduce an unnecessary copy.

幻想少年梦 2024-08-20 20:21:09

这里的重点是,你想用缓冲区做什么。
如果要保留带有指针的结构,则必须将缓冲区固定在首先分配的内存地址处。
为了避免这种情况,您必须使用相对指针和修复列表来在最终分配后更新指针。这本身就值得一类。 (没有找到这样的东西)。

The point here is, what you want to use the buffer for.
If you want to keep structures with pointers the buffer has to be kept fix at the memory address allocated first.
To circumvent this, you have to use relative pointers and a fixup list for updating the pointers after the final allocation. This would be worth a class of its own. (Didn't find such a thing).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文