用 C 字符串初始化 std::string 的奇怪方法

发布于 2024-10-19 12:17:45 字数 425 浏览 2 评论 0原文

当我阅读 nVidia CUDA 源代码时,我偶然发现了这两行:

    std::string stdDevString;

    stdDevString = std::string(device_string);

请注意,device_string 是一个 char[1024]。问题是:为什么构造一个空的 std::string,然后用 C 字符串作为参数再次构造它?他们为什么不在一行中调用 std::string stdDevString = std::string(device_string); ?

此代码是否试图逃避/使用隐藏的字符串初始化行为?是为了确保 stdDevString 内的 C 字符串无论如何都保持以 null 结尾?因为据我所知,将 std::string 初始化为非 null 终止的 C 字符串仍然会出现问题。

while I was reading nVidia CUDA source code, I stumbled upon these two lines:

    std::string stdDevString;

    stdDevString = std::string(device_string);

Note that device_string is a char[1024]. The question is: Why construct an empty std::string, then construct it again with a C string as an argument? Why didn't they call std::string stdDevString = std::string(device_string); in just one line?

Is there a hidden string initialization behavior that this code tries to evade/use? Is to ensure that the C string inside stdDevString remains null terminated no matter what? Because as far as I know, initializing an std::string to a C string that's not null terminated will still exhibit problems.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

尬尬 2024-10-26 12:17:45

为什么他们不只用一行调用 std::string stdDevString = std::string(device_string);

他们的所作所为没有充分的理由。给定 std::string::string(const char*) 构造函数,您可以简单地使用以下任何一个:

std::string stdDevString = device_string;
std::string stdDevString(device_string);
std::string stdDevString{device_string}; // C++11 { } syntax

两步默认构造然后赋值只是(糟糕的)程序员风格或疏忽。如果没有优化,它确实做了一些不必要的构建,但这仍然相当便宜。它可能已通过优化删除。不是什么大事 - 我怀疑我是否愿意在代码审查中提及它,除非它处于对性能极其敏感的区域,但绝对最好推迟声明变量,直到有一个有用的初始值可用于构造它们,并将其本地化一切都集中在一处:它不仅不易出错且可交叉引用,而且最大限度地减少了变量的范围,简化了其使用的推理。

是为了确保 stdDevString 中的 C 字符串无论如何都保持以 null 结尾?

不——这没有什么区别。自 C++11 起,无论使用哪个构造函数,stdDevString 中的内部缓冲区都将保持 NUL 终止,而对于 C++03 不一定终止 - 请参阅 C++03 的专用标题详细信息如下 - 但无论构建/分配如何完成,都不能保证。

因为据我所知,将 std::string 初始化为非 null 终止的 C 字符串仍然会出现问题。

你是对的 - 你列出的任何构造选项都只会将 ASCIIZ 文本复制到 std::string 中 - 考虑到第一个 NUL ('\0')终结者。如果 char 数组不是以 NUL 结尾,则会出现问题。

(这是一个单独的问题,即 std::string 内的缓冲区是否保持 NUL 终止 - 如上所述)。

请注意,有一个单独的 string(const char*, size_type) 构造函数,它可以创建嵌入 NUL 的字符串,并且不会尝试读取比告诉的更进一步(构造函数(4)此处

C++03 std ::strings 不保证内部以 NUL 结尾

无论以何种方式构造和初始化 std::string在 C++11 之前 标准不要求它在字符串缓冲区内以 NUL 结尾。 std::string 最好被想象为包含一堆潜在的不可打印(宽松地说,是 ftp/文件 I/O 意义上的二进制)字符,从地址 data() 并扩展 size() 字符。因此,如果您有:

std::string x("help");
x[4];  // undefined behaviour: only [0]..[3] are safe
x.at(4); // will throw rather than return '\0'
x.data()[4]; // undefined behaviour, equivalent to x[4] above
x.c_str()[4]; // safely returns '\0', (perhaps because a NUL was always
              // at x[4], one was just added, or a new NUL-terminated
              // buffer was just prepared - in which case data() may
              // or may not start returning it too)

请注意,std::string API 要求 c_str() 返回指向 NUL 终止值的指针。为此,它可以:

  • 始终主动在字符串缓冲区的末尾保留一个额外的 NUL(在这种情况下,data[5] 会发生 为了保证该实现的安全,但是如果实现发生更改或代码被移植到另一个标准库实现等,代码可能会中断。)
  • 反应性地等待 c_str()调用,然后:

    • 如果当前地址有足够的容量(即data()),则附加一个NUL并返回与data() 将返回
    • 否则,分配一个新的、更大的缓冲区,复制数据,NUL 终止它,并返回指向它的指针(通常但可选此缓冲区将替换旧缓冲区将被删除,这样之后立即调用 data() 将返回与 c_str() 返回的相同指针)

Why didn't they call std::string stdDevString = std::string(device_string); in just one line?

No good reason for what they did. Given the std::string::string(const char*) constructor, you can simply use any of:

std::string stdDevString = device_string;
std::string stdDevString(device_string);
std::string stdDevString{device_string}; // C++11 { } syntax

The two-step default construction then assignment is just (bad) programmer style or oversight. Sans optimisation, it does do a little unnecessary construction, but that's still pretty cheap. It's likely removed by optimisation. Not a biggie - I doubt if I'd bother to mention it in a code review unless it was in an extremely performance sensitive area, but it's definitely best to defer declaring variables until a useful initial value is available to construct them with, localising it all in one place: not only is it less error prone and cross-referenceable, but it minimises the scope of the variable simplifying the reasoning about its use.

Is to ensure that the C string inside stdDevString remains null terminated no matter what?

No - it made no difference to that. Since C++11 the internal buffer in stdDevString would be kept NUL terminated regardless of which constructor is used, while for C++03 isn't not necessarily terminated - see dedicated heading for C++03 details below - but there's no guarantees regardless of how construction / assignment is done.

Because as far as I know, initializing an std::string to a C string that's not null terminated will still exhibit problems.

You're right - any of the construction options you've listed will only copy ASCIIZ text into the std::string - considering the first NUL ('\0') the terminator. If the char array isn't NUL-terminated there will be problems.

(That's a separate issue to whether the buffer inside the std::string is kept NUL terminated - discussed above).

Note that there's a separate string(const char*, size_type) constructor that can create strings with embedded NULs, and won't try to read further than told (Constructor (4) here)

C++03 std::strings were not guaranteed NUL-terminated internally

Whichever way the std::string is constructed and initialised, before C++11 the Standard did not require it to be NUL-terminated within the string's buffer. std::string was best imagined as containing a bunch of potentially non-printable (loosely speaking, binary in the ftp/file I/O sense) characters starting at address data() and extending for size() characters. So, if you had:

std::string x("help");
x[4];  // undefined behaviour: only [0]..[3] are safe
x.at(4); // will throw rather than return '\0'
x.data()[4]; // undefined behaviour, equivalent to x[4] above
x.c_str()[4]; // safely returns '\0', (perhaps because a NUL was always
              // at x[4], one was just added, or a new NUL-terminated
              // buffer was just prepared - in which case data() may
              // or may not start returning it too)

Note that the std::string API requires c_str() to return a pointer to a NUL-terminated value. To do so, it can either:

  • proactively keep an extra NUL on the end of the string buffer at all times (in which case data[5] would happen to be safe on that implementation, but the code could break if the implementation changed or the code was ported to another Standard library implementation etc.)
  • reactively wait until c_str() is called, then:

    • if it has enough capacity at the current address (i.e. data()), append a NUL and return the same pointer value that data() would return
    • otherwise, allocate a new, larger buffer, copy the data over, NUL terminate it, and return a pointer to it (typically but optionally this buffer would replace the old buffer which would be deleted, such that calling data() immediately afterwards would return the same pointer returned by c_str())
一杆小烟枪 2024-10-26 12:17:45

我想说这相当于写:

std::string stdDevString = std::string(device_string);

或者,更简单:

std::string stdDevString = device_string;

一旦创建了 std::string,它就包含 C 字符串中数据的私有副本。

I would say that it's equivalent of writing:

std::string stdDevString = std::string(device_string);

Or, even simpler:

std::string stdDevString = device_string;

Once the std::string has been created, it contains a private copy of the data in the C string.

凡尘雨 2024-10-26 12:17:45

我认为将其视为糟糕的编码是无知的。如果我们假设该字符串是在文件范围内分配的或作为静态变量分配的,那么它可能是良好的编码。

当为存在非易失性存储器的嵌入式系统进行 C++ 编程时,您希望避免静态初始化的原因有很多:主要原因是它在程序开头添加了大量开销代码,所有此类变量都需要初始化。如果它们是类的实例,则将调用构造函数。

这将导致程序执行开始时出现延迟峰值。您希望出现这个工作负载峰值,因为启动程序时还有更多重要的任务要做,例如设置各种硬件。

为了避免这种情况,您通常会在编译器中启用一个选项来删除此类静态初始化,然后以不初始化静态/全局变量的方式编写代码,而是在运行时设置它们。

在这样的系统上,OP 发布的代码是正确的方法。

I think it is ignorant to dismiss this as poor coding. If we assume that this string was allocated at file scope or as a static variable, it could be good coding.

When programming C++ for embedded systems with non-volatile memory present, there are many reasons why you wish to avoid static initialization: the main reason is that it adds lots of overhead code in the beginning of the program, where all such variables much be initialized. If they are instances of classes, constructors will be called.

This will lead to a delay peak at the beginning of the program execution. You don't want this workload peak there, because there are much more important tasks to do when starting up the program, like setting up various hardware.

To avoid this, you typically enable an option in the compiler which removes such static initialization, and then write your code in such a manner that no static/global variables are initialized, but instead set them in runtime.

On such a system, the code posted by the OP is the correct way to do it.

时光沙漏 2024-10-26 12:17:45

对我来说看起来像是一件人工制品。也许中间还有一些其他代码,然后它被删除了,有人懒得将剩下的两行合并成一行。

Looks like an artefact to me. Perhaps there was some other code in between, then it got removed, and someone was too lazy to join those two remaining lines into a single one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文