如何使用带有不区分大小写的unicode字符串作为键的hash_map?
我对 STL 很陌生,对 C++ 也很陌生。我试图获得相当于 .NET Dictionary
但在 C++ 中的内容。这大致就是我正在尝试的:
stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")
问题是,这两种查找操作通常都不起作用(它返回 someMap.end()
)。它似乎有时有效,但大多数时候无效。我猜测 hash_map 使用的哈希函数是对字符串的内存地址而不是字符串本身的内容进行哈希处理,并且它几乎肯定不区分大小写。
如何获得类似字典的结构,使用不区分大小写的键并可以存储我的自定义结构?
I'm very new to STL, and pretty new to C++ in general. I'm trying to get the equivalent of a .NET Dictionary<string, value>(StringComparer.OrdinalIgnoreCase)
but in C++. This is roughly what I'm trying:
stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")
The trouble is, neither find operation usually works (it returns someMap.end()
). It seems to sometimes work, but most of the time it doesn't. I'm guessing that the hash function the hash_map is using is hashing the memory address of the string instead of the content of the string itself, and it's almost certainly not case insensitive.
How can I get a dictionary-like structure that uses case-insensitive keys and can store my custom struct?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您链接到的 hash_map 文档表明您可以提供自己的特征类作为第三个模板参数。这必须满足与 hash_compare 相同的接口。
扫描文档,我认为你必须做的是这个,它基本上取代了你在字典中使用的
StringComparer.OrdinalIgnoreCase
:我担心文档说比较函数必须是全序,而不是 C++ 标准库中排序容器常见的严格弱序。如果 MS 确实意味着全序,那么 hash_map 可能依赖于它与
operator==
一致。也就是说,他们可能要求如果my_hash_compare()(a,b)
为 false,且my_hash_compare()(b,a)
为 false,则a == b
。显然我所写的内容并非如此,在这种情况下你就不走运了。作为替代方案(在任何情况下都可能更有效),您可以将所有键推送到常见情况,然后再在地图中使用它们。不区分大小写的比较比常规字符串比较的成本更高。不过,有一些与此相关的 Unicode 陷阱我永远记不清了。也许你必须转换->小写->大写,而不只是 ->大写或类似的东西,以避免某些语言或标题字符中出现一些令人讨厌的情况。有人吗?
另外,正如其他人所说,您可能并不真正希望 LPCWSTR 作为您的密钥。这将在映射中存储指针,这意味着插入字符串的任何人都必须确保它指向的数据只要在 hash_map 中就保持有效。从长远来看,hash_map 保留传递给
insert
的键字符串的副本通常会更好,在这种情况下,您应该使用wstring
作为关键。The hash_map documentation you link to indicates that you can supply your own traits class as a third template parameter. This must satisfy the same interface as hash_compare.
Scanning the docs, I think that what you have to do is this, which basically replaces the use of
StringComparer.OrdinalIgnoreCase
you had in your Dictionary:I'm worried though that the docs say that the comparison function has to be a total order, not a strict weak order as is usual for sorted containers in the C++ standard libraries. If MS really means a total order, then the hash_map might rely on it being consistent with
operator==
. That is, they might require that ifmy_hash_compare()(a,b)
is false, andmy_hash_compare()(b,a)
is false, thena == b
. Obviously that's not true for what I've written, in which case you're out of luck.As an alternative, which in any case is probably more efficient, you could push all the keys to a common case before using them in the map. A case-insensitive comparison is more costly than a regular string comparison. There's some Unicode gotcha to do with that which I can never quite remember, though. Maybe you have to convert -> lowercase -> uppercase, instead of just -> uppercase, or something like that, in order to avoid some nasty cases in certain languages or with titlecase characters. Anyone?
Also as other people said, you might not really want LPCWSTR as your key. This will store pointers in the map, which means that anyone who inserts a string has to ensure that the data it points to remains valid as long as it's in the hash_map. It's often better in the long run for hash_map to keep a copy of the key string passed to
insert
, in which case you should usewstring
as the key.这里提供了一些重要的信息。我从答案中收集了一些碎片,并将其放在一起:
There was some great information given here. I gathered bits and pieces from the answers and put this one together:
如果您使用
std::map
而不是非标准hash_map
,则可以设置进行二分查找时要使用的比较函数:If you use an
std::map
instead of the non-standardhash_map
, you can set the comparison function to be used when doing the binary search:LPCWSTR 是一个指向以 null 结尾的 unicode 字符数组的指针,在这种情况下可能不是您想要的。请改用
basic_string
的wstring
特化。对于不区分大小写的情况,您需要在插入和搜索之前将键转换为全部大写或全部小写。至少我认为你无法以其他方式做到这一点。
LPCWSTR is a pointer to a null-terminated array of unicode characters and probably not what you want in this case. Use the
wstring
specialization ofbasic_string
instead.For case-insensitivity, you would need to convert the keys to all upper case or all lower case before you insert and search. At least I don't think you can do it any other way.