为什么在写入“char *s”时会出现分段错误? 用字符串文字初始化,但不是“char s[]”?
以下代码在第 2 行收到 seg 错误:
char *str = "string";
str[0] = 'z'; // could be also written as *str = 'z'
printf("%s\n", str);
虽然这工作得很好:
char str[] = "string";
str[0] = 'z';
printf("%s\n", str);
使用 MSVC 和 GCC 进行了测试。
The following code receives seg fault on line 2:
char *str = "string";
str[0] = 'z'; // could be also written as *str = 'z'
printf("%s\n", str);
While this works perfectly well:
char str[] = "string";
str[0] = 'z';
printf("%s\n", str);
Tested with MSVC and GCC.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
请参阅 C 常见问题解答,问题 1.32
See the C FAQ, Question 1.32
通常,程序运行时字符串文字存储在只读存储器中。 这是为了防止您意外更改字符串常量。 在第一个示例中,
"string"
存储在只读内存中,*str
指向第一个字符。 当您尝试将第一个字符更改为'z'
时,会发生段错误。在第二个示例中,编译器将字符串
"string"
从其只读主目录复制到str[]
数组。 然后允许更改第一个字符。 您可以通过打印每个地址来检查这一点:此外,在第二个示例中打印
str
的大小将显示编译器已为其分配了 7 个字节:Normally, string literals are stored in read-only memory when the program is run. This is to prevent you from accidentally changing a string constant. In your first example,
"string"
is stored in read-only memory and*str
points to the first character. The segfault happens when you try to change the first character to'z'
.In the second example, the string
"string"
is copied by the compiler from its read-only home to thestr[]
array. Then changing the first character is permitted. You can check this by printing the address of each:Also, printing the size of
str
in the second example will show you that the compiler has allocated 7 bytes for it:大多数答案都是正确的,但只是为了增加一点清晰度......
人们所指的“只读存储器”是 ASM 术语中的文本段。 它与加载指令的内存位置相同。 由于安全等明显原因,这是只读的。 当您创建初始化为字符串的 char* 时,字符串数据将被编译到文本段中,并且程序会初始化指针以指向文本段。 所以,如果你尝试改变它,那就好了。 段错误。
当写入数组时,编译器会将初始化的字符串数据放置在数据段中,这与全局变量等所在的位置相同。 该内存是可变的,因为数据段中没有指令。 这次,当编译器初始化字符数组(仍然只是 char*)时,它指向数据段而不是文本段,您可以在运行时安全地更改文本段。
Most of these answers are correct, but just to add a little more clarity...
The "read only memory" that people are referring to is the text segment in ASM terms. It's the same place in memory where the instructions are loaded. This is read-only for obvious reasons like security. When you create a char* initialized to a string, the string data is compiled into the text segment and the program initializes the pointer to point into the text segment. So if you try to change it, kaboom. Segfault.
When written as an array, the compiler places the initialized string data in the data segment instead, which is the same place that your global variables and such live. This memory is mutable, since there are no instructions in the data segment. This time when the compiler initializes the character array (which is still just a char*) it's pointing into the data segment rather than the text segment, which you can safely alter at run-time.
C99 N1256 草案
字符串文字有两种不同的用途:
初始化
char[]
:这是“更多魔法”,在 6.7.8/14“初始化”中进行了描述:
<块引用>
字符类型的数组可以由字符串文字初始化,可选
括在大括号中。 字符串文字的连续字符(包括
如果有空间或者数组大小未知则终止空字符)初始化
数组的元素。
所以这只是一个快捷方式:
与任何其他常规数组一样,
c
可以修改。在其他地方:它会生成:
所以当你写:
这类似于:
请注意从
char[]
到char *
的隐式转换,这始终是合法的。那么如果你修改
c[0]
,你也会修改__unnamed
,也就是UB。这在 6.4.5“字符串文字”中有记录:
<块引用>
5 在转换阶段 7,值为零的字节或代码被附加到每个多字节
由一个或多个字符串文字产生的字符序列。 多字节字符
然后使用序列来初始化静态存储持续时间和长度的数组
足以包含序列。 对于字符串文字,数组元素具有
类型 char,并使用多字节字符的各个字节进行初始化
序列[...]
6 未指定这些数组是否不同,只要它们的元素具有
适当的值。 如果程序尝试修改这样的数组,则行为是
未定义。
6.7.8/32“初始化”给出了一个直接的例子:
GCC 4.8 x86-64 ELF 实现
程序:
编译和反编译:
输出包含:
结论:GCC 将
char*
存储在.rodata
部分,而不是在.text
中。如果我们对
char[]
执行相同的操作:我们得到:
因此它被存储在堆栈中(相对于
%rbp
)。但请注意,默认链接器脚本将
.rodata
和.text
放在同一段中,该段具有执行权限,但没有写入权限。 这可以通过以下方式观察到:其中包含:
C99 N1256 draft
There are two different uses of character string literals:
Initialize
char[]
:This is "more magic", and described at 6.7.8/14 "Initialization":
So this is just a shortcut for:
Like any other regular array,
c
can be modified.Everywhere else: it generates an:
So when you write:
This is similar to:
Note the implicit cast from
char[]
tochar *
, which is always legal.Then if you modify
c[0]
, you also modify__unnamed
, which is UB.This is documented at 6.4.5 "String literals":
6.7.8/32 "Initialization" gives a direct example:
GCC 4.8 x86-64 ELF implementation
Program:
Compile and decompile:
Output contains:
Conclusion: GCC stores
char*
it in.rodata
section, not in.text
.If we do the same for
char[]
:we obtain:
so it gets stored in the stack (relative to
%rbp
).Note however that the default linker script puts
.rodata
and.text
in the same segment, which has execute but no write permission. This can be observed with:which contains:
在第一个代码中,“string”是一个字符串常量,字符串常量永远不应该被修改,因为它们通常被放入只读存储器中。 “str”是一个用于修改常量的指针。
在第二个代码中,“string”是一个数组初始值设定项,“str”的缩写
是一个在堆栈上分配的数组,可以自由修改。
In the first code, "string" is a string constant, and string constants should never be modified because they are often placed into read only memory. "str" is a pointer being used to modify the constant.
In the second code, "string" is an array initializer, sort of short hand for
"str" is an array allocated on the stack and can be modified freely.
因为第一个示例上下文中的
"whatever"
类型是const char *
(即使您将其分配给非常量 char*),这意味着您不应该尝试写入它。编译器通过将字符串放入内存的只读部分来强制执行此操作,因此写入它会生成段错误。
Because the type of
"whatever"
in the context of the 1st example isconst char *
(even if you assign it to a non-const char*), which means you shouldn't try and write to it.The compiler has enforced this by putting the string in a read-only part of memory, hence writing to it generates a segfault.
上面将
str
设置为指向文字值"string"
,该值在程序的二进制映像中硬编码,可能在内存中标记为只读。因此
str[0]=
正在尝试写入应用程序的只读代码。 我猜想这可能与编译器有关。The above sets
str
to point to the literal value"string"
which is hard-coded in the program's binary image, which is probably flagged as read-only in memory.So
str[0]=
is attempting to write to the read-only code of the application. I would guess this is probably compiler dependent though.要理解此错误或问题,您应该首先了解指针和数组的区别
所以在这里首先我解释一下它们之间的区别
字符串数组
内存数组存储在连续的内存单元中,存储为
[h][e][l][l][o][\0] => ;[]
是 1 个字符字节大小的存储单元,并且可以通过此处名为 strarray 的名称来访问此连续存储单元。因此这里字符串数组strarray
本身包含初始化为其的字符串的所有字符.在本例中“hello”
因此我们可以通过索引值访问每个字符来轻松更改其内存内容
,并将其值更改为
'm'
,因此 strarray 值更改为"mello"
;这里需要注意的一点是,我们可以通过逐个字符地更改字符串数组的内容,但不能像
strarray="new string"
那样直接将其他字符串初始化为它,无效指针
众所周知,指针指向内存中的内存位置,
未初始化的指针指向随机内存位置,因此初始化后指向特定的内存位置,
此处指针 ptr 被初始化为字符串
"hello"
,它是存储在只读内存 (ROM) 中的常量字符串,因此" hello"
无法更改,因为它存储在 ROM 中,而 ptr 存储在堆栈部分并指向常量字符串
"hello"
,因此 ptr[0]='m' 无效,因为你不能访问只读内存
,但 ptr 可以直接初始化为其他字符串值,因为它只是指针,因此它可以指向其数据类型变量的任何内存地址
To understand this error or problem you should first know difference b/w the pointer and array
so here firstly i have explain you differences b/w them
string array
In memory array is stored in continuous memory cells, stored as
[h][e][l][l][o][\0] =>[]
is 1 char byte size memory cell ,and this continuous memory cells can be access by name named strarray here.so here string arraystrarray
itself containing all characters of string initialized to it.in this case here"hello"
so we can easily change its memory content by accessing each character by its index value
and its value changed to
'm'
so strarray value changed to"mello"
;one point to note here that we can change the content of string array by changing character by character but can not initialized other string directly to it like
strarray="new string"
is invalidPointer
As we all know pointer points to memory location in memory ,
uninitialized pointer points to random memory location so and after initialization points to particular memory location
here pointer ptr is initialized to string
"hello"
which is constant string stored in read only memory (ROM) so"hello"
can not be changed as it is stored in ROMand ptr is stored in stack section and pointing to constant string
"hello"
so ptr[0]='m' is invalid since you can not access read only memory
But ptr can be initialised to other string value directly since it is just pointer so it can be point to any memory address of variable of its data type
分配一个指向字符串文字的指针,编译器将其放入可执行文件的不可修改部分;
分配并初始化一个可修改的本地数组
allocates a pointer to a string literal, which the compiler is putting in a non-modifiable part of your executable;
allocates and initializes a local array which is modifiable
@matli 链接到的 C FAQ 提到了这一点,但这里还没有其他人提到过,所以为了澄清:如果在初始化之外的任何地方使用字符串文字(源代码中的双引号字符串)一个字符数组(即:@Mark的第二个例子,它工作正常),该字符串由编译器存储在一个特殊的静态字符串表中,这类似于创建一个全局静态变量(只读,当然)本质上是匿名的(没有变量“名称”)。 只读部分是重要的部分,也是@Mark的第一个代码示例出现段错误的原因。
The C FAQ that @matli linked to mentions it, but no one else here has yet, so for clarification: if a string literal (double-quoted string in your source) is used anywhere other than to initialize a character array (ie: @Mark's second example, which works correctly), that string is stored by the compiler in a special static string table, which is akin to creating a global static variable (read-only, of course) that is essentially anonymous (has no variable "name"). The read-only part is the important part, and is why the @Mark's first code example segfaults.
该
行定义了一个指针并将其指向一个文字字符串。 文字字符串不可写,因此当您这样做时:
您会遇到段错误。 在某些平台上,文字可能位于可写内存中,因此您不会看到段错误,但无论如何它都是无效代码(导致未定义的行为)。
该行:
分配一个字符数组,并将文字字符串复制到该数组中,该数组是完全可写的,因此后续更新没有问题。
The
line defines a pointer and points it to a literal string. The literal string is not writable so when you do:
you get a seg fault. On some platforms, the literal might be in writable memory so you won't see a segfault, but it's invalid code (resulting in undefined behavior) regardless.
The line:
allocates an array of characters and copies the literal string into that array, which is fully writable, so the subsequent update is no problem.
像“string”这样的字符串文字可能会作为只读数据分配在可执行文件的地址空间中(给或接受编译器)。 当你去触摸它时,它会惊慌地发现你在它的泳衣区域,并让你知道段错误。
在第一个示例中,您将获得指向该 const 数据的指针。 在第二个示例中,您使用 const 数据的副本初始化一个包含 7 个字符的数组。
String literals like "string" are probably allocated in your executable's address space as read-only data (give or take your compiler). When you go to touch it, it freaks out that you're in its bathing suit area and lets you know with a seg fault.
In your first example, you're getting a pointer to that const data. In your second example, you're initializing an array of 7 characters with a copy of the const data.
常量内存
由于字符串文字在设计上是只读的,因此它们存储在内存的常量部分中。 存储在那里的数据是不可变的,即不能更改。 因此,C 代码中定义的所有字符串文字都会在此处获得只读内存地址。
堆栈内存 内存
的堆栈部分是局部变量地址所在的地方,例如函数中定义的变量。
正如 @matli 的回答所暗示的,有两种方法可以处理这些常量字符串。
1. 指向字符串文字的指针
当我们定义指向字符串文字的指针时,我们正在创建一个位于堆栈内存中的指针变量。 它指向底层字符串文字所在的只读地址。
如果我们尝试通过插入来修改
s
,我们会收到
分段错误(核心转储)
。 我们正在尝试访问我们不应该访问的内存。 我们正在尝试修改只读地址0x7ffc8e224620
的值。2. 字符数组
为了便于举例,假设存储在常量内存中的字符串文字
"Hello"
具有与上面相同的只读内存地址,0x7ffc8e224620
。Constant memory
Since string literals are read-only by design, they are stored in the Constant part of memory. Data stored there is immutable, i.e., cannot be changed. Thus, all string literals defined in C code get a read-only memory address here.
Stack memory
The Stack part of memory is where the addresses of local variables live, e.g., variables defined in functions.
As @matli's answer suggests, there are two ways of working with string these constant strings.
1. Pointer to string literal
When we define a pointer to a string literal, we are creating a pointer variable living in Stack memory. It points to the read-only address where the underlying string literal resides.
If we try to modify
s
by insertingwe get a
Segmentation fault (core dumped)
. We are trying to access memory that we shouldn't access. We are attempting to modify the value of a read-only address,0x7ffc8e224620
.2. Array of chars
For the sake of the example, suppose the string literal
"Hello"
stored in constant memory has a read-only memory address identical to the one above,0x7ffc8e224620
.首先,
str
是一个指向"string"
的指针。 编译器可以将字符串文字放置在内存中无法写入但只能读取的位置。 (这确实应该触发警告,因为您将const char *
分配给char *
。您是否禁用了警告,或者只是忽略了它们? )其次,您要创建一个数组(您可以完全访问它的内存),并使用
"string"
对其进行初始化。 您正在创建一个char[7]
(六个用于字母,一个用于终止“\0”),然后您可以对它执行任何您喜欢的操作。In the first place,
str
is a pointer that points at"string"
. The compiler is allowed to put string literals in places in memory that you cannot write to, but can only read. (This really should have triggered a warning, since you're assigning aconst char *
to achar *
. Did you have warnings disabled, or did you just ignore them?)In the second place, you're creating an array, which is memory that you've got full access to, and initializing it with
"string"
. You're creating achar[7]
(six for the letters, one for the terminating '\0'), and you do whatever you like with it.假设字符串是,
在第一种情况下,当“a”进入范围时将复制文字。 这里“a”是在堆栈上定义的数组。 这意味着字符串将在堆栈上创建,并且其数据从代码(文本)内存复制,该内存通常是只读的(这是特定于实现的,编译器也可以将此只读程序数据放入可读写内存中)。
在第二种情况下,p 是在堆栈(局部范围)上定义的指针,并引用存储在其他位置的字符串文字(程序数据或文本)。 通常修改这样的记忆并不是好的做法,也不值得鼓励。
Assume the strings are,
In the first case, the literal is to be copied when 'a' comes into scope. Here 'a' is an array defined on stack. It means the string will be created on the stack and its data is copied from code (text) memory, which is typically read-only (this is implementation specific, a compiler can place this read-only program data in read-writable memory also).
In the second case, p is a pointer defined on stack (local scope) and referring a string literal (program data or text) stored else where. Usually modifying such memory is not good practice nor encouraged.
K&R
的5.5节字符指针和函数
也讨论了这个主题:Section 5.5 Character Pointers and Functions
ofK&R
also discusses about this topic:第一个是无法修改的常量字符串。 第二个是一个具有初始化值的数组,因此可以对其进行修改。
First is one constant string which can't be modified. Second is an array with initialized value, so it can be modified.
当您尝试访问无法访问的内存时,会导致分段错误。
char *str
是一个指向不可修改字符串的指针(出现段错误的原因)。而 char str[] 是一个数组并且可以修改。
Segmentation fault is caused when you try to access the memory which is inaccessible.
char *str
is a pointer to a string that is nonmodifiable(the reason for getting segfault).whereas
char str[]
is an array and can be modifiable..第一个的含义与您想象的不同:
它的意思是,
str
的变量中。接下来,您尝试更改存储字符串“string”的地址:
尝试一个简单的程序
您可以通过简单地更改您的程序来证明这是正确的。
尝试编译并运行以下命令:
在我的机器上,结果输出为:
0x55e789c1d004
当您尝试使用 str[0] = 'z' 写入
str[0]
时,您试图将地址“z”存储在地址0x55e789c1d004 处
This first one means something different than you think it does:
It means,
str
.Next, you try to change the address where your string "string" is stored:
Try A Simple Program
You can prove that this is true by simply altering your program a bit.
Try compiling and running the following:
On my machine the result output is:
0x55e789c1d004
When you attempt to write to
str[0]
with str[0] = 'z' your attempting to store the address 'z' at the address0x55e789c1d004