关于字符集和映射的一些问题(翻译阶段1)
以下问题是关于字符集(C11,5.2.1 字符集)和映射(C11,5.1.1.2 翻译阶段,1)。
列表:
作为扩展的源字符集是否可以包含控制字符,代表水平制表符、垂直制表符和换页符以外的字符?如果是,那么在例如字符串文字中使用此类控制字符时是否需要生成诊断?
示例:GCC/LLVM/MSVC 支持字符串中的多个控制字符 字面上不发出诊断并且他们保持这样的控制 映射后的字符串文字中的字符 翻译阶段 1 已完成。 (意味着 GCC/LLVM/MSVC 支持 源字符集中的这些控制字符。)不生成诊断可以吗?
演示:
# GCC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
t999.c:1:12: warning: null character(s) preserved in literal
1 | char x[] = "x x"; int s = sizeof x;
| ^
s:
.long 4
# here we see that a diagnostic is produced, sizeof x is 4
# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
s:
.long 4
# here we see that no diagnostic is produced, sizeof x is 4
# MSVC
# test \x00
# see below
# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s DD 04H
# here we see that no diagnostic is produced, sizeof x is 4
- C11,5.1.1.2 翻译阶段,1:
物理源文件多字节字符以实现定义的方式映射 方式,到源字符集(引入换行符 行尾指示器)(如有必要)。
一个简单的问题:“映射到无”仍然是映射吗?例如 X => <无>
。或者也许这不是“映射”,而是“跳过”(或“删除”)?示例:在 "x
中(二进制 22 78 00 79 22
)MSVC 跳过/删除空字符,不产生诊断(使得 sizeof
产生 3 而不是 4)。可以吗?
演示:
# MSVC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s DD 03H
# here we see that no diagnostic is produced, sizeof x is 3
The questions below are about Character sets (C11, 5.2.1 Character sets) and mapping (C11, 5.1.1.2 Translation phases, 1).
The list:
Can a source character set as an extension include control characters, representing other than horizontal tab, vertical tab, and form feed? If yes, then does a diagnostic need to be produced when using such control characters in e.g. string literal?
Example: GCC/LLVM/MSVC support many control characters in a string
literal w/o issuing a diagnostic AND they keep such control
characters in the string literal after the mapping at the
translation phase 1 is done. (Meaning that GCC/LLVM/MSVC support
these control characters in the source character set.) Is it OK that diagnostic is not produced?
Demo:
# GCC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
t999.c:1:12: warning: null character(s) preserved in literal
1 | char x[] = "x x"; int s = sizeof x;
| ^
s:
.long 4
# here we see that a diagnostic is produced, sizeof x is 4
# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
s:
.long 4
# here we see that no diagnostic is produced, sizeof x is 4
# MSVC
# test \x00
# see below
# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s DD 04H
# here we see that no diagnostic is produced, sizeof x is 4
- C11, 5.1.1.2 Translation phases, 1:
Physical source file multibyte characters are mapped, in an implementation-defined
manner, to the source character set (introducing new-line characters for
end-of-line indicators) if necessary.
A simple question: is "mapping to nothing" still a mapping? E.g. X => <nothing>
. Or perhaps it is not a "mapping", but "skipping" (or "removal")? Example: in "x<null>y"
(in binary 22 78 00 79 22
) MSVC skips/removes null character w/o producing a diagnostic (making sizeof
produce 3 instead of 4). Is it OK?
Demo:
# MSVC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s DD 03H
# here we see that no diagnostic is produced, sizeof x is 3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论