无序集（const char）比无序集（字符串）慢得多

发布于 2024-11-18 05:30:54 字数 1072 浏览 5 评论 0原文

我正在将一个很长的列表从磁盘加载到一个 unordered_set 中。如果我使用一组字符串，速度会非常快。大约 7 MB 的测试列表在大约 1 秒内加载。然而，使用一组 char 指针大约需要 2.1 分钟！

这是字符串版本的代码：

unordered_set<string> Set;
string key;
while (getline(fin, key))
{
    Set.insert(key);
}

这是 char* 版本的代码：

struct unordered_eqstr
{
    bool operator()(const char* s1, const char* s2) const
    {
        return strcmp(s1, s2) == 0;
    }
};

struct unordered_deref
{
    template <typename T>
    size_t operator()(const T* p) const
    {
        return hash<T>()(*p);
    }
};

unordered_set<const char*, unordered_deref, unordered_eqstr> Set;
string key;

while (getline(fin, key))
{
    char* str = new(mem) char[key.size()+1];
    strcpy(str, key.c_str());
    Set.insert(str);
}

“new(mem)”是因为我使用自定义内存管理器，所以我可以分配大内存块并将它们分配给小内存块像 C 字符串这样的对象。然而，我已经用常规的“新”对此进行了测试，结果是相同的。我还在其他工具中使用了内存管理器，没有出现任何问题。

这两个结构对于根据实际的 C 字符串而不是其地址进行插入和查找哈希是必需的。我实际上在堆栈溢出上找到了 unordered_deref 。

最终我需要加载数千兆字节的文件。这就是我使用自定义内存管理器的原因，但这也是为什么这种可怕的减速是不可接受的。有什么想法吗？

原文

I'm loading a very long list from disk into an unordered_set. If I use a set of strings, it is very fast. A test list of about 7 MB loads in about 1 second. However, using a set of char pointers takes about 2.1 minutes!

Here is the code for the string version:

unordered_set<string> Set;
string key;
while (getline(fin, key))
{
    Set.insert(key);
}

Here is the code for the char* version:

struct unordered_eqstr
{
    bool operator()(const char* s1, const char* s2) const
    {
        return strcmp(s1, s2) == 0;
    }
};

struct unordered_deref
{
    template <typename T>
    size_t operator()(const T* p) const
    {
        return hash<T>()(*p);
    }
};

unordered_set<const char*, unordered_deref, unordered_eqstr> Set;
string key;

while (getline(fin, key))
{
    char* str = new(mem) char[key.size()+1];
    strcpy(str, key.c_str());
    Set.insert(str);
}

The "new(mem)" is because I'm using a custom memory manager so I can allocate big blocks of memory and give them out to tiny objects like c strings. However, I've tested this with regular "new" and the results are identical. I've also used my memory manager in other tools with no problems.

The two structs are necessary to make the insert and find hash based on the actual c string and not its address. The unordered_deref I actually found here on stack overflow.

Eventually I need to load multi-gigabyte files. This is why I'm using a custom memory manager, but it's also why this horrible slow down is unacceptable. Any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一笔一画续写前缘 2024-11-25 05:30:54

开始了。

struct unordered_deref
{
    size_t operator()(const char* p) const
    {
        return hash<string>()(p);
    }
};

Here we go.

struct unordered_deref
{
    size_t operator()(const char* p) const
    {
        return hash<string>()(p);
    }
};

回复收藏 0 原文

~没有更多了~

关于作者

听不够的曲调

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

无序集（const char）比无序集（字符串）慢得多

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

无序集（const char）比无序集（字符串）慢得多

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。