角色阵列应如何用作字符串?

发布于 2025-01-30 10:19:12 字数 361 浏览 2 评论 0原文

我知道C中的字符串只是字符数组。因此,我尝试了以下代码,但是它给出了奇怪的结果,例如垃圾输出或程序崩溃:

#include <stdio.h>

int main (void)
{
  char str [5] = "hello";
  puts(str);
}

为什么这不起作用?

它用gcc -std = c17 -pedantic -errors -wall -Wall -wextra干净地编译。


注意:该帖子被用作典型的常见问题解答,用于在声明字符串时无法为NUL终端分配空间的问题。

I understand that strings in C are just character arrays. So I tried the following code, but it gives strange results, such as garbage output or program crashes:

#include <stdio.h>

int main (void)
{
  char str [5] = "hello";
  puts(str);
}

Why doesn't this work?

It compiles cleanly with gcc -std=c17 -pedantic-errors -Wall -Wextra.


Note: This post is meant to be used as a canonical FAQ for problems stemming from a failure to allocate room for a NUL terminator when declaring a string.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

淡墨 2025-02-06 10:19:12

ac String是一个字符数组,以 null终结器结尾

所有字符都有一个符号表值。空终端是符号值0(零)。它用于标记字符串的末端。这是必要的,因为字符串的大小没有存储在任何地方。

因此,每次您为字符串分配空间时,都必须为空终端字符提供足够的空间。您的示例不执行此操作,仅为“ Hello”的5个字符的空间分配空间。正确的代码应为:

char str[6] = "hello";

或等效地,您可以为5个字符加1个null终结器编写自记录代码:

char str[5+1] = "hello";

但是您也可以使用此功能并让编译器进行计数并选择大小:

char str[] = "hello"; // Will allocate 6 bytes automatically

当在运行中动态分配字符串时, - 时间,您还需要为空终止器分配空间:

char input[n] = ... ;
...
char* str = malloc(strlen(input) + 1);

如果您不在字符串末端附加零件终结器,则库函数期望字符串无法正常工作,并且您会得到“未定义的行为”错误例如垃圾输出或程序崩溃。

在C中编写Null终结器字符的最常见方法是使用所谓的“八倍逃逸序列”,看起来像这样:'\ 0'。这是100%等于编写0,但是\用作自记录的代码,以指出零是明确表示为无效终结者。诸如之类的代码如果(str [i] =='\ 0')将检查特定字符是否为null终结者。

请注意,术语null终结器与null指针或null宏无关!这可能会令人困惑 - 非常相似的名称,但含义却非常不同。这就是为什么有时用一个L将零终端称为nul,而不是与null或null指针混淆。请参阅的答案。

代码中的“ Hello”称为A 字符串字面。这被视为只读字符串。 “”语法意味着编译器将在字符串字面自动的末尾附加零件终结器。因此,如果您打印出sizeof(“ Hello”),则会获得6而不是5,因为您获得了包括Null终端的数组的大小。


它用GCC

干净地编译

,甚至没有警告。这是因为C语言中有一个微妙的细节/缺陷,它允许字符数组使用字符串字面的字符字体初始化,该字符串包含与数组中的空间一样多,然后静静地丢弃了null终结器(C17 6.7.9/ 15)。该语言是故意的,出于历史原因,请参见不一致的GCC诊断以获取详细信息。另请注意,C ++在这里有所不同,不允许使用此技巧/缺陷。

A C string is a character array that ends with a null terminator.

All characters have a symbol table value. The null terminator is the symbol value 0 (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere.

Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character. Your example does not do this, it only allocates room for the 5 characters of "hello". Correct code should be:

char str[6] = "hello";

Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:

char str[5+1] = "hello";

But you can also use this and let the compiler do the counting and pick the size:

char str[] = "hello"; // Will allocate 6 bytes automatically

When allocating memory for a string dynamically in run-time, you also need to allocate room for the null terminator:

char input[n] = ... ;
...
char* str = malloc(strlen(input) + 1);

If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes.

The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: '\0'. This is 100% equivalent to writing 0, but the \ serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as if(str[i] == '\0') will check if the specific character is the null terminator.

Please note that the term null terminator has nothing to do with null pointers or the NULL macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as NUL with one L, not to be confused with NULL or null pointers. See answers to this SO question for further details.

The "hello" in your code is called a string literal. This is to be regarded as a read-only string. The "" syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out sizeof("hello") you will get 6, not 5, because you get the size of the array including a null terminator.


It compiles cleanly with gcc

Indeed, not even a warning. This is because of a subtle detail/flaw in the C language that allows character arrays to be initialized with a string literal that contains exactly as many characters as there is room in the array and then silently discard the null terminator (C17 6.7.9/15). The language is purposely behaving like this for historical reasons, see Inconsistent gcc diagnostic for string initialization for details. Also note that C++ is different here and does not allow this trick/flaw to be used.

初见终念 2025-02-06 10:19:12

根据C标准(7.1.1术语定义)

1 字符串是由和
包括第一个null字符。
术语多键字符串为
有时用来强调给予的特殊处理
字符串中包含的多重字符或避免混乱
用宽字符串。指向字符串的指针是指向其初始的指针
(最低)字符。字符串的长度是
null字符之前的字节和字符串的值是
包含字符的值的顺序,顺序。

在此声明中,

char str [5] = "hello";

字符串文字“ hello”具有内部表示形式,

{ 'h', 'e', 'l', 'l', 'o', '\0' }

因此它具有6个字符,包括终止零。它的元素用于初始化字符数组str仅保留5个字符的空间。

当字符串字面的终止零作为初始化器时,C标准(与C ++标准相反)允许字符数组的初始化。

但是,角色数组str不包含字符串。

如果您希望该数组包含一个字符串,则可以编写

char str [6] = "hello";

,或者仅

char str [] = "hello";

在最后一个情况下,字符数组的大小是根据字符串文字等于6的初始化器的数量确定的。

From the C Standard (7.1.1 Definitions of terms)

1 A string is a contiguous sequence of characters terminated by and
including the first null character.
The term multibyte string is
sometimes used instead to emphasize special processing given to
multibyte characters contained in the string or to avoid confusion
with a wide string. A pointer to a string is a pointer to its initial
(lowest addressed) character. The length of a string is the number of
bytes preceding the null character and the value of a string is the
sequence of the values of the contained characters, in order.

In this declaration

char str [5] = "hello";

the string literal "hello" has the internal representation like

{ 'h', 'e', 'l', 'l', 'o', '\0' }

so it has 6 characters including the terminating zero. Its elements are used to initialize the character array str which reserve space only for 5 characters.

The C Standard (opposite to the C++ Standard) allows such an initialization of a character array when the terminating zero of a string literal is not used as an initializer.

However as a result the character array str does not contain a string.

If you want that the array would contain a string you could write

char str [6] = "hello";

or just

char str [] = "hello";

In the last case the size of the character array is determined from the number of initializers of the string literal that is equal to 6.

情绪操控生活 2025-02-06 10:19:12

可以将所有字符串视为字符的数组( yes ),可以将所有字符数组视为字符串 no )。

为什么不呢? 它很重要?

为什么 C库的功能如何处理字符串?”

虽然字符数组可以保留相同的字符,但它只是字符数组,除非最后一个字符后面是nul-termination 字符。 nul-termination 字符允许考虑(处理为)字符串的字符数组。

C中的所有功能都期望字符串作为参数期望字符的序列是 nul终止为什么?

它与所有字符串函数的工作方式有关。由于长度不包括作为数组的一部分,字符串 - 功能,因此在数组中向前扫描,直到 nul-character (例如找到十进制0)。请参阅 ascii table and Description 。无论您是否正在使用strcpystrchrstrcspn等。所有字符串函数都取决于 nul-termination 存在字符以定义该字符串的末端在哪里。

string.h中对两个类似函数的比较将强调 nul-terminating 字符的重要性。以:

    char *strcpy(char *dest, const char *src);

strcpy函数简单地将字节从src复制到dest直到找到 nul-termination cartar告诉strcpy在哪里停止复制字符。现在以相似的函数memcpy

    void *memcpy(void *dest, const void *src, size_t n);

该功能执行相似的操作,但不考虑或不需要src参数为字符串。由于memcpy不能简单地在src将字节复制到dest之前,直到达到 nul-termination 字符,需要明确数量的字节来复制为第三个参数。该第三个参数提供memcpy具有相同的大小信息strcpy可以通过向前扫描直到找到 nul-termination 字符来得出。

(这也强调了strcpy(或任何期望字符串的功能)中的问题在哪里停下来,将在您的其余内存段中愉快地竞赛 不确定的行为 ,直到a nul-character 恰好在记忆中的某个地方找到 - 或分段发生故障)

也就是为什么必须传递 nul-em>字符串的函数,必须传递 nul-enul terminated string string and 为什么它很重要< /em>。

Can all strings be considered an array of characters (Yes), can all character arrays be considered strings (No).

Why Not? and Why does it matter?

In addition to the other answers explaining that the length of a string is not stored anywhere as part of the string and the references to the standard where a string is defined, the flip-side is "How do the C library functions handle strings?"

While a character array can hold the same characters, it is simply an array of characters unless the last character is followed by the nul-terminating character. That nul-terminating character is what allows the array of characters to be considered (handled as) a string.

All functions in C that expect a string as an argument expect the sequence of characters to be nul-terminated. Why?

It has to do with the way all string functions work. Since the length isn't included as part of an array, string-functions, scan forward in the array until the nul-character (e.g. '\0' -- equivalent to decimal 0) is found. See ASCII Table and Description. Regardless whether you are using strcpy, strchr, strcspn, etc.. All string functions rely on the nul-terminating character being present to define where the end of that string is.

A comparison of two similar functions from string.h will emphasize the importance of the nul-terminating character. Take for example:

    char *strcpy(char *dest, const char *src);

The strcpy function simply copies bytes from src to dest until the nul-terminating character is found telling strcpy where to stop copying characters. Now take the similar function memcpy:

    void *memcpy(void *dest, const void *src, size_t n);

The function performs a similar operation, but does not consider or require the src parameter to be a string. Since memcpy cannot simply scan forward in src copying bytes to dest until a nul-terminating character is reached, it requires an explicit number of bytes to copy as a third parameter. This third parameter provides memcpy with the same size information strcpy is able to derive simply by scanning forward until a nul-terminating character is found.

(which also emphasizes what goes wrong in strcpy (or any function expecting a string) if you fail to provide the function with a nul-terminated string -- it has no idea where to stop and will happily race off across the rest of your memory segment invoking Undefined Behavior until a nul-character just happens to be found somewhere in memory -- or a Segmentation Fault occurs)

That is why functions expecting a nul-terminated string must be passed a nul-terminated string and why it matters.

生生漫 2025-02-06 10:19:12

直观地...

将数组视为变量(保存事物),将字符串视为一个值(可以将其放置在变量中)。

它们当然不是同一件事。在您的情况下,变量太小而无法固定字符串,因此将其切断。 (C中的“引用字符串”在末尾具有隐式null字符。)

但是,可以将字符串存储在数组中的字符串中,该数组比字符串大得多。

请注意,通常的分配和比较操作员(= == &lt;等)无法正常工作。但是,一旦您知道自己在做什么,strxyz功能系列就会非常接近。请参阅 c faq on

Intuitively...

Think of an array as a variable (holds things) and a string as a value (can be placed in a variable).

They are certainly not the same thing. In your case the variable is too small to hold the string, so the string gets cut off. ("quoted strings" in C have an implicit null character at the end.)

However it's possible to store a string in an array that is much larger than the string.

Note that the usual assignment and comparison operators (= == < etc.) don't work as you might expect. But the strxyz family of functions comes pretty close, once you know what you're doing. See the C FAQ on strings and arrays.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文