将一堆 const char* 保存在集合中的最简单、最安全的方法?

发布于 2024-07-08 21:51:21 字数 323 浏览 7 评论 0原文

我想将一堆 const char 指针保存到 std::set 容器中 [1]。 std::set 模板需要一个比较器函子,标准 C++ 库提供了 std::less,但其实现是基于直接比较两个键,这对于指针来说不是标准的。

我知道我可以定义自己的函子并通过将指针强制转换为整数并比较它们来实现operator(),但是有没有一种更干净、“标准”的方法来实现呢?

请不要建议创建 std::strings - 这是浪费时间和空间。 这些字符串是静态的,因此可以根据它们的地址比较它们是否相等。

1:指针指向静态字符串,因此它们的生命周期没有问题 - 它们不会消失。

I want to hold a bunch of const char pointers into an std::set container [1]. std::set template requires a comparator functor, and the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.

I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them, but is there a cleaner, 'standard' way of doing it?

Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.

1: The pointers are to static strings, so there is no problem with their lifetimes - they won't go away.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

酷到爆炸 2024-07-15 21:51:21

如果您不想将它们包装在 std::string 中,您可以定义一个仿函数类:

struct ConstCharStarComparator
{
  bool operator()(const char *s1, const char *s2) const
  {
    return strcmp(s1, s2) < 0;
  }
};

typedef std::set<const char *, ConstCharStarComparator> stringset_t;
stringset_t myStringSet;

If you don't want to wrap them in std::strings, you can define a functor class:

struct ConstCharStarComparator
{
  bool operator()(const char *s1, const char *s2) const
  {
    return strcmp(s1, s2) < 0;
  }
};

typedef std::set<const char *, ConstCharStarComparator> stringset_t;
stringset_t myStringSet;
忆梦 2024-07-15 21:51:21

只需继续并使用默认排序,即 less<> 。 该标准保证 less 即使对于指向不同对象的指针也能工作:

“对于模板 Greater、less、greater_equal 和 less_equal,任何类型的特化
指针类型会产生全序,即使内置运算符 <、>、<=、>= 不会。

” 。

Just go ahead and use the default ordering which is less<>. The Standard guarantees that less will work even for pointers to different objects:

"For templates greater, less, greater_equal, and less_equal, the specializations for any
pointer type yield a total order, even if the built-in operators <, >, <=, >= do not."

The guarantee is there exactly for things like your set<const char*>.

孤千羽 2024-07-15 21:51:21

“优化方式”

如果我们忽略“过早的优化是万恶之源”,标准的方式是添加一个比较器,这很容易写:

struct MyCharComparator
{
   bool operator()(const char * A, const char * B) const
   {
      return (strcmp(A, B) < 0) ;
   }
} ;

To use with a:

std::set<const char *, MyCharComparator>

标准方式

Use a:

std::set<std::string>

它甚至可以工作如果你在里面放一个 static const char * (因为 std::string 与 const char * 不同,它的内容是可比较的)。

当然,如果需要提取数据,就必须通过std::string.c_str()来提取数据。 另一方面, ,但由于它是一个集合,我想您只想知道“AAA”是否在集合中,而不是提取“AAA”的值“AAA”。

注意:我确实读过“请不要建议创建 std::strings”,但是随后,您询问了“标准”方式...

“从不这样做”方式

我注意到以下评论的 在我回答之后:

<块引用>

请不要建议创建 std::strings - 这是浪费时间和空间。 字符串是静态的,因此可以根据其地址(内)相等性来比较它们

这有 C 的味道(使用已弃用的“ static”关键字,可能用于 std::string bashing 的过早优化,以及通过其地址进行字符串比较)。

无论如何,您不想通过其地址比较字符串。因为我猜你最不想要的就是拥有一个包含以下内容的集合:

{ "AAA", "AAA", "AAA" }

当然,如果你只使用相同的全局变量来包含字符串,这是另一个故事。

在这种情况下,我建议:

std::set<const char *>

当然,如果比较内容相同但变量/地址不同的字符串是不行的。

当然,如果 static const char * 字符串是在标头中定义的,它就无法使用这些字符串。

但这是另一个故事了。

The "optimized way"

If we ignore the "premature optimization is the root of all evil", the standard way is to add a comparator, which is easy to write:

struct MyCharComparator
{
   bool operator()(const char * A, const char * B) const
   {
      return (strcmp(A, B) < 0) ;
   }
} ;

To use with a:

std::set<const char *, MyCharComparator>

The standard way

Use a:

std::set<std::string>

It will work even if you put a static const char * inside (because std::string, unlike const char *, is comparable by its contents).

Of course, if you need to extract the data, you'll have to extract the data through std::string.c_str(). In the other hand, , but as it is a set, I guess you only want to know if "AAA" is in the set, not extract the value "AAA" of "AAA".

Note: I did read about "Please do not suggest creating std::strings", but then, you asked the "standard" way...

The "never do it" way

I noted the following comment after my answer:

Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.

This smells of C (use of the deprecated "static" keyword, probable premature optimization used for std::string bashing, and string comparison through their addresses).

Anyway, you don't want to to compare your strings through their address. Because I guess the last thing you want is to have a set containing:

{ "AAA", "AAA", "AAA" }

Of course, if you only use the same global variables to contain the string, this is another story.

In this case, I suggest:

std::set<const char *>

Of course, it won't work if you compare strings with the same contents but different variables/addresses.

And, of course, it won't work with static const char * strings if those strings are defined in a header.

But this is another story.

雨夜星沙 2024-07-15 21:51:21

根据“一堆”有多大,我倾向于在集合中存储相应的一堆 std::string 。 这样您就不必编写任何额外的粘合代码。

Depending on how big a "bunch" is, I would be inclined to store a corresponding bunch of std::strings in the set. That way you won't have to write any extra glue code.

花开半夏魅人心 2024-07-15 21:51:21

该集合必须包含 const char* 吗?

立即想到的是将字符串存储在 std::string 中,然后将它们放入 std::set 中。 这将允许比较而不会出现问题,并且您始终可以通过简单的函数调用获取原始 const char*

const char* data = theString.c_str();

Must the set contain const char*?

What immediately springs to mind is storing the strings in a std::string instead, and putting those into the std::set. This will allow comparisons without a problem, and you can always get the raw const char* with a simple function call:

const char* data = theString.c_str();
梦初启 2024-07-15 21:51:21

使用比较器,或使用包含在集合中的包装类型。 (注意:std::string 也是一个包装器......)

const char* a("a");
const char* b("b");

struct CWrap {
    const char* p;
    bool operator<(const CWrap& other) const{
        return strcmp( p, other.p ) < 0;
    }
    CWrap( const char* p ): p(p){}
};

std::set<CWrap> myset;
myset.insert(a);
myset.insert(b);

Either use a comparator, or use a wrapper type to be contained in the set. (Note: std::string is a wrapper, too....)

const char* a("a");
const char* b("b");

struct CWrap {
    const char* p;
    bool operator<(const CWrap& other) const{
        return strcmp( p, other.p ) < 0;
    }
    CWrap( const char* p ): p(p){}
};

std::set<CWrap> myset;
myset.insert(a);
myset.insert(b);
花开柳相依 2024-07-15 21:51:21

其他人已经发布了大量解决方案,展示了如何与 const char* 进行词法比较,所以我不会打扰。

请不要建议创建 std::strings - 这是浪费时间和空间。

如果 std::string 浪费时间和空间,那么 std::set 也可能浪费时间和空间。 std::set 中的每个元素都是从空闲存储中单独分配的。 根据程序使用集合的方式,这对性能的影响可能比 std::set 的 O(log n) 查找对性能的帮助更大。 使用其他数据结构(例如排序的 std::vector 或在编译时排序的静态分配的数组)可能会获得更好的结果,具体取决于集合的预期生命周期。

标准 C++ 库提供了 std::less,但其实现是基于直接比较两个键,这对于指针来说不是标准的。

字符串是静态的,因此可以根据它们的地址比较它们是否相等。

这取决于指针指向的内容。 如果所有键都是从同一个数组分配的,则使用 operator< 来比较指针不是未定义的行为。

包含单独静态字符串的数组示例:

static const char keys[] = "apple\0banana\0cantaloupe";

如果创建 std::set并用指向该数组的指针填充它,则它们的顺序将被明确定义。

但是,如果字符串都是单独的字符串文字,则比较它们的地址很可能会涉及未定义的行为。 它是否有效取决于您的编译器/链接器实现、您如何使用它以及您的期望。

如果您的编译器/链接器支持字符串池并启用它,则重复的字符串文字应该具有相同的地址,但在所有情况下都保证如此吗? 依靠链接器优化来实现正确的功能是否安全?

如果您仅在一个翻译单元中使用字符串文字,则设置的顺序可能基于字符串首次使用的顺序,但如果您更改另一个翻译单元以使用相同的字符串文字之一,集合顺序可能会改变。

我知道我可以定义自己的函子并通过将指针转换为整数并比较它们来实现operator()

将指针转换为uintptr_t似乎比使用指针比较没有任何好处。 无论哪种方式,结果都是相同的:特定于实现。

Others have already posted plenty of solutions showing how to do lexical comparisons with const char*, so I won't bother.

Please do not suggest creating std::strings - it is a waste of time and space.

If std::string is a waste of time and space, then std::set might be a waste of time and space as well. Each element in a std::set is allocated separately from the free store. Depending on how your program uses sets, this may hurt performance more than std::set's O(log n) lookups help performance. You may get better results using another data structure, such as a sorted std::vector, or a statically allocated array that is sorted at compile time, depending on the intended lifetime of the set.

the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.

The strings are static, so they can be compared for (in)equality based on their address.

That depends on what the pointers point to. If all of the keys are allocated from the same array, then using operator< to compare pointers is not undefined behavior.

Example of an array containing separate static strings:

static const char keys[] = "apple\0banana\0cantaloupe";

If you create a std::set<const char*> and fill it with pointers that point into that array, their ordering will be well-defined.

If, however, the strings are all separate string literals, comparing their addresses will most likely involve undefined behavior. Whether or not it works depends on your compiler/linker implementation, how you use it, and your expectations.

If your compiler/linker supports string pooling and has it enabled, duplicate string literals should have the same address, but are they guaranteed to in all cases? Is it safe to rely on linker optimizations for correct functionality?

If you only use the string literals in one translation unit, the set ordering may be based on the order that the strings are first used, but if you change another translation unit to use one of the same string literals, the set ordering may change.

I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them

Casting the pointers to uintptr_t would seem to have no benefit over using pointer comparisons. The result is the same either way: implementation-specific.

谁的年少不轻狂 2024-07-15 21:51:21

由于性能原因,您可能不想使用 std::string 。

我正在运行 MSVC 和 gcc,它们似乎都不介意这一点:

bool foo = "blah" < "grar";

编辑:但是,这种情况下的行为未指定。 查看评论...

他们也不会抱怨 std::set

如果您使用的编译器确实会抱怨,我可能会继续使用您建议的函子,将指针强制转换为 int 。

编辑:
嘿,我被否决了……尽管我是这里最直接回答他问题的少数几个人之一。 我是 Stack Overflow 的新手,如果发生这种情况,有什么方法可以保护自己吗? 话虽如此,我将尝试在这里:

问题不是寻找 std::string 解决方案。 每次您在集合中输入 std::string 时,都需要复制整个字符串(无论如何,直到 C++0x 成为标准)。 此外,每次进行集合查找时,都需要进行多个字符串比较。

然而,将指针存储在集合中不会导致任何字符串复制(您只是复制指针),并且每次比较都是地址上的简单整数比较,而不是字符串比较。

问题指出存储指向字符串的指针很好,我看不出为什么我们应该立即假设这个语句是错误的。 如果您知道自己在做什么,那么与 std::string 或调用 的自定义比较相比,使用 const char* 可以获得相当大的性能提升strcmp。 是的,它不太安全,而且更容易出错,但这些是性能的常见权衡,并且由于问题从未说明应用程序,我认为我们应该假设他已经考虑了利弊并决定支持性能。

Presumably you don't want to use std::string because of performance reasons.

I'm running MSVC and gcc, and they both seem to not mind this:

bool foo = "blah" < "grar";

EDIT: However, the behaviour in this case is unspecified. See comments...

They also don't complain about std::set<const char*>.

If you're using a compiler that does complain, I would probably go ahead with your suggested functor that casts the pointers to ints.

Edit:
Hey, I got voted down... Despite being one of the few people here that most directly answered his question. I'm new to Stack Overflow, is there any way to defend yourself if this happens? That being said, I'll try to right here:

The question is not looking for std::string solutions. Every time you enter an std::string in to the set, it will need to copy the entire string (until C++0x is standard, anyway). Also, every time you do a set look-up, it will need to do multiple string compares.

Storing the pointers in the set, however, incurs NO string copy (you're just copying the pointer around) and every comparison is a simple integer comparison on the addresses, not a string compare.

The question stated that storing the pointers to the strings was fine, I see no reason why we should all immediately assume that this statement was an error. If you know what you're doing, then there are considerable performance gains to using a const char* over either std::string or a custom comparison that calls strcmp. Yes, it's less safe, and more prone to error, but these are common trade-offs for performance, and since the question never stated the application, I think we should assume that he's already considered the pros and cons and decided in favor of performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文