如何使用带有不区分大小写的unicode字符串作为键的hash_map?

发布于 2024-08-15 14:03:13 字数 545 浏览 1 评论 0原文

我对 STL 很陌生,对 C++ 也很陌生。我试图获得相当于 .NET Dictionary(StringComparer.OrdinalIgnoreCase) 但在 C++ 中的内容。这大致就是我正在尝试的:

stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")

问题是,这两种查找操作通常都不起作用(它返回 someMap.end())。它似乎有时有效,但大多数时候无效。我猜测 hash_map 使用的哈希函数是对字符串的内存地址而不是字符串本身的内容进行哈希处理,并且它几乎肯定不区分大小写。

如何获得类似字典的结构,使用不区分大小写的键并可以存储我的自定义结构?

I'm very new to STL, and pretty new to C++ in general. I'm trying to get the equivalent of a .NET Dictionary<string, value>(StringComparer.OrdinalIgnoreCase) but in C++. This is roughly what I'm trying:

stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")

The trouble is, neither find operation usually works (it returns someMap.end()). It seems to sometimes work, but most of the time it doesn't. I'm guessing that the hash function the hash_map is using is hashing the memory address of the string instead of the content of the string itself, and it's almost certainly not case insensitive.

How can I get a dictionary-like structure that uses case-insensitive keys and can store my custom struct?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

橘味果▽酱 2024-08-22 14:03:13

您链接到的 hash_map 文档表明您可以提供自己的特征类作为第三个模板参数。这必须满足与 hash_compare 相同的接口。

扫描文档,我认为你必须做的是这个,它基本上取代了你在字典中使用的 StringComparer.OrdinalIgnoreCase

struct my_hash_compare {
    const size_t bucket_size = 4;
    const size_t min_buckets = 8;
    size_t operator()(const LPCWSTR &Key) const {
        // implement a case-insensitive hash function here,
        // or find something in the Windows libraries.
    }
    bool operator()(const LPCWSTR &Key1, const LPCWSTR &Key2) const {
        // implement a case-insensitive comparison function here
        return _wcsicmp(Key1, Key2) < 0;
        // or something like that. There's warnings about
        // locale plastered all over this function's docs.
    }
};

我担心文档说比较函数必须是全序,而不是 C++ 标准库中排序容器常见的严格弱序。如果 MS 确实意味着全序,那么 hash_map 可能依赖于它与 operator== 一致。也就是说,他们可能要求如果 my_hash_compare()(a,b) 为 false,且 my_hash_compare()(b,a) 为 false,则 a == b。显然我所写的内容并非如此,在这种情况下你就不走运了。

作为替代方案(在任何情况下都可能更有效),您可以将所有键推送到常见情况,然后再在地图中使用它们。不区分大小写的比较比常规字符串比较的成本更高。不过,有一些与此相关的 Unicode 陷阱我永远记不清了。也许你必须转换->小写->大写,而不只是 ->大写或类似的东西,以避免某些语言或标题字符中出现一些令人讨厌的情况。有人吗?

另外,正如其他人所说,您可能并不真正希望 LPCWSTR 作为您的密钥。这将在映射中存储指针,这意味着插入字符串的任何人都必须确保它指向的数据只要在 hash_map 中就保持有效。从长远来看,hash_map 保留传递给 insert 的键字符串的副本通常会更好,在这种情况下,您应该使用 wstring 作为关键。

The hash_map documentation you link to indicates that you can supply your own traits class as a third template parameter. This must satisfy the same interface as hash_compare.

Scanning the docs, I think that what you have to do is this, which basically replaces the use of StringComparer.OrdinalIgnoreCase you had in your Dictionary:

struct my_hash_compare {
    const size_t bucket_size = 4;
    const size_t min_buckets = 8;
    size_t operator()(const LPCWSTR &Key) const {
        // implement a case-insensitive hash function here,
        // or find something in the Windows libraries.
    }
    bool operator()(const LPCWSTR &Key1, const LPCWSTR &Key2) const {
        // implement a case-insensitive comparison function here
        return _wcsicmp(Key1, Key2) < 0;
        // or something like that. There's warnings about
        // locale plastered all over this function's docs.
    }
};

I'm worried though that the docs say that the comparison function has to be a total order, not a strict weak order as is usual for sorted containers in the C++ standard libraries. If MS really means a total order, then the hash_map might rely on it being consistent with operator==. That is, they might require that if my_hash_compare()(a,b) is false, and my_hash_compare()(b,a) is false, then a == b. Obviously that's not true for what I've written, in which case you're out of luck.

As an alternative, which in any case is probably more efficient, you could push all the keys to a common case before using them in the map. A case-insensitive comparison is more costly than a regular string comparison. There's some Unicode gotcha to do with that which I can never quite remember, though. Maybe you have to convert -> lowercase -> uppercase, instead of just -> uppercase, or something like that, in order to avoid some nasty cases in certain languages or with titlecase characters. Anyone?

Also as other people said, you might not really want LPCWSTR as your key. This will store pointers in the map, which means that anyone who inserts a string has to ensure that the data it points to remains valid as long as it's in the hash_map. It's often better in the long run for hash_map to keep a copy of the key string passed to insert, in which case you should use wstring as the key.

梦情居士 2024-08-22 14:03:13

这里提供了一些重要的信息。我从答案中收集了一些碎片,并将其放在一起:

#include "stdafx.h"
#include "atlbase.h"
#include <map>
#include <wchar.h>

typedef std::pair<std::wstring, int> MyPair;

struct key_comparer
{
    bool operator()(std::wstring a, std::wstring b) const
    {
        return _wcsicmp(a.c_str(), b.c_str()) < 0;
    }
};

int _tmain(int argc, _TCHAR* argv[])
{
    std::map<std::wstring, int, key_comparer> mymap;
    mymap.insert(MyPair(L"GHI",3));
    mymap.insert(MyPair(L"DEF",2));
    mymap.insert(MyPair(L"ABC",1));

    std::map<std::wstring, int, key_comparer>::iterator iter;
    iter = mymap.find(L"def");
    if (iter == mymap.end()) {
        printf("No match.\n");
    } else {
        printf("match: %i\n", iter->second);
    }
    return 0;
}

There was some great information given here. I gathered bits and pieces from the answers and put this one together:

#include "stdafx.h"
#include "atlbase.h"
#include <map>
#include <wchar.h>

typedef std::pair<std::wstring, int> MyPair;

struct key_comparer
{
    bool operator()(std::wstring a, std::wstring b) const
    {
        return _wcsicmp(a.c_str(), b.c_str()) < 0;
    }
};

int _tmain(int argc, _TCHAR* argv[])
{
    std::map<std::wstring, int, key_comparer> mymap;
    mymap.insert(MyPair(L"GHI",3));
    mymap.insert(MyPair(L"DEF",2));
    mymap.insert(MyPair(L"ABC",1));

    std::map<std::wstring, int, key_comparer>::iterator iter;
    iter = mymap.find(L"def");
    if (iter == mymap.end()) {
        printf("No match.\n");
    } else {
        printf("match: %i\n", iter->second);
    }
    return 0;
}
安静被遗忘 2024-08-22 14:03:13

如果您使用 std::map 而不是非标准 hash_map,则可以设置进行二分查找时要使用的比较函数:

// Function object for case insensitive comparison
struct case_insensitive_compare
{
    case_insensitive_compare() {}

    // Function objects overloader operator()
    // When used as a comparer, it should function as operator<(a,b)
    bool operator()(const std::string& a, const std::string& b) const
    {
        return to_lower(a) < to_lower(b);
    }

    std::string to_lower(const std::string& a) const
    {
        std::string s(a);
        std::for_each(s.begin(), s.end(), char_to_lower);
        return s;
    }

    void char_to_lower(char& c) const
    {
        if (c >= 'A' && c <= 'Z')
            c += ('a' - 'A');
    }
};

// ...

std::map<std::string, std::string, case_insensitive_compare> someMap;
someMap["foo"] = "Hello, world!";
std::cout << someMap["FOO"] << endl; // Hello, world!

If you use an std::map instead of the non-standard hash_map, you can set the comparison function to be used when doing the binary search:

// Function object for case insensitive comparison
struct case_insensitive_compare
{
    case_insensitive_compare() {}

    // Function objects overloader operator()
    // When used as a comparer, it should function as operator<(a,b)
    bool operator()(const std::string& a, const std::string& b) const
    {
        return to_lower(a) < to_lower(b);
    }

    std::string to_lower(const std::string& a) const
    {
        std::string s(a);
        std::for_each(s.begin(), s.end(), char_to_lower);
        return s;
    }

    void char_to_lower(char& c) const
    {
        if (c >= 'A' && c <= 'Z')
            c += ('a' - 'A');
    }
};

// ...

std::map<std::string, std::string, case_insensitive_compare> someMap;
someMap["foo"] = "Hello, world!";
std::cout << someMap["FOO"] << endl; // Hello, world!
掌心的温暖 2024-08-22 14:03:13

LPCWSTR 是一个指向以 null 结尾的 unicode 字符数组的指针,在这种情况下可能不是您想要的。请改用 basic_stringwstring 特化。

对于不区分大小写的情况,您需要在插入和搜索之前将键转换为全部大写或全部小写。至少我认为你无法以其他方式做到这一点。

LPCWSTR is a pointer to a null-terminated array of unicode characters and probably not what you want in this case. Use the wstring specialization of basic_string instead.

For case-insensitivity, you would need to convert the keys to all upper case or all lower case before you insert and search. At least I don't think you can do it any other way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文