存储一系列大小未知的值的最有效方法是什么?
我有一个函数(例如,名为 next_entity
),可以生成 size_t 值。该函数充当生成器,即每次调用时都会生成一个新值,最后返回 0 作为标记。
在另一个调用 next_entity
的函数中,我需要将值存储在某处。在收到哨兵之前我不知道值的数量,因此我无法在这些值到来之前进行 malloc 或静态分配数组来包含这些值。
问题是,在哨兵到来之后,我唯一需要对这些值做的事情就是将它们保存到文件中,但不能重复,也就是说,每个值只能出现一次。
之后,不再需要这些值。
我尝试在迭代期间使用 glib.h
中的 GHashTable
将值存储为键,但 GHashTable
的问题是传递给函数 g_hash_table_insert 的键的指针必须在哈希表的生命周期内保持活动状态,因此我必须为每个新值执行 malloc(sizeof(size_t)) 。
它可以工作,但似乎效率很低,因为 malloc
很耗时。
有更好的方法吗?
如果需要的话我可以发布真实的代码,但我不认为问题出在代码上。
任何帮助将不胜感激,预先感谢您!
I have a function (say, named next_entity
), that generates size_t values. The function acts as a generator, that is, it produces a new value on each call, and, finally, returns 0 as a sentinel.
In another function, that calls next_entity
, I need to store the values somewhere. I don't know the amount of the values before I receive the sentinel, so I cannot malloc
or statically allocate an array to contain these values before they come.
The thing is, that after the sentinel comes, the only thing I need to do with the values is to save them to a file, but with no repetitions, that is, each value must occur only once.
After that, the values are no more needed.
I tried to use GHashTable
from glib.h
during the iteration, to store the values as the keys, but the problem with GHashTable
is that the pointers to the keys passed to the function g_hash_table_insert must remain alive during the lifecycle of the hash-table, so I have to do kind of malloc(sizeof(size_t))
for each new value.
It works, but it seems to be quite inefficient, because malloc
is time-consuming.
Is there any better way to do that?
I can post the real code if it is needed, but I don't think the problem is about the code.
Any help would be appreciated, thank you in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果数据大小不是千兆字节,则可以使用动态数组,每次空间不足时,都可以通过realloc()将其大小加倍。使用此策略,只需进行 log(N) 次重新分配。
例如,在 C++ 中,许多 std::vector 实现通常就是这样做的。
If you data size is not gigabytes, you can do with a dynamically array, which you double in size with
realloc()
ing every time you run out of space. With this strategy only log(N) reallocations will have to happen.In C++, for example, a lot of
std::vector
implementations usually do just that.其他人建议重新分配。相反,请考虑使用 malloced 块的链接列表,其中每个块包含一个较大的数字数组;当你填满一个块时,分配另一个块并将前一个块链接到它......或者将它链接到前一个块,然后在输出它们之前反转列表。关键是您不需要将值存储在连续内存中,因此不需要使用 realloc。 realloc 复制前一个数组,您可以避免这种情况,对于非常大的数组来说,这会很慢,如果找不到足够大的连续块,它甚至可能会耗尽内存,而分配单个块更具弹性。缺点是需要更多的工作来管理链表。
====
编辑 GHashTable 用法:
将您的值存储到数组中并将该元素的地址传递给哈希例程...如果它尚不存在,则前进数组指针。要输出值,只需枚举哈希表中的键即可。数组的链表仅在释放它们时才需要。如果这就是程序所做的全部工作,那么您甚至不需要维护链表;您可以根据需要分配数组,当程序退出时它们都会被释放。
Others have suggested realloc. Instead of that, consider using a linked list of malloced blocks, where each block contains a largish array of numbers; when you fill up a block, allocate another one and link the previous one to it ... or link it to the previous one and then reverse the list before outputing them. The point is that you don't need to store your values in contiguous memory, so you don't need to use realloc. realloc copies the previous array, which you can avoid and which will be slow for very large arrays, and it could even run out of memory if it can't find a large enough contiguous block, whereas allocating individual blocks is more resilient. The downside is that it takes more work to manage the linked list.
====
Edit for GHashTable usage:
Store your value into the array and pass the address of that element to the hash routine ... if it's not already present, advance the array pointer. To output the values, just enumerate the keys from the hash table. The linked list of arrays is only needed for deallocating them. If this is all the program does, then you don't even need to maintain a linked list; you can just allocate arrays as you need them, and they will all get deallocated when your program exits.
哈希表中的键是 void*,并且 void* 始终至少与 size_t 一样大。
您需要做的就是,使用 g_hash_table_new(NULL,NULL) 代替 malloc(sizeof(size_t)) 来使用 g_direct_hash 作为哈希方法。然后执行以下操作:
要迭代键,请使用 GPOINTER_TO_SIZE 返回到 size_t。
您始终可以对任何整数类型执行此操作,而不是对其进行 malloc。 (适当时使用 GINT_TO_POINTER、GUIINT_TO_POINTER,而不是 GSIZE_TO_POINTER)
The keys in a hash table are a void*, and a void* is always at least as large as a size_t.
All you need to do is, instead of malloc(sizeof(size_t)), use g_hash_table_new(NULL,NULL) to use g_direct_hash as the hash method. Then do this:
To iterate over keys, use GPOINTER_TO_SIZE to get back to the size_t.
You can always do this for any integer type, instead of malloc'ing it. (use GINT_TO_POINTER, GUINT_TO_POINTER, instead of GSIZE_TO_POINTER when appropriate)
通常的方法是将所使用的空间乘以一个常数(2、1.69、1.618、1.5、...)。
我喜欢黄金比例:)
The usual way is to multiply the space used by a constant (2, 1.69, 1.618, 1.5, ...).
I like the golden ratio :)