关于字符集和映射的一些问题(翻译阶段1)

发布于 2025-01-20 12:51:06 字数 2051 浏览 0 评论 0原文

以下问题是关于字符集(C11,5.2.1 字符集)和映射(C11,5.1.1.2 翻译阶段,1)。

列表:

  1. 作为扩展的源字符集是否可以包含控制字符,代表水平制表符、垂直制表符和换页符以外的字符?如果是,那么在例如字符串文字中使用此类控制字符时是否需要生成诊断?

    示例:GCC/LLVM/MSVC 支持字符串中的多个控制字符 字面上不发出诊断并且他们保持这样的控制 映射后的字符串文字中的字符 翻译阶段 1 已完成。 (意味着 GCC/LLVM/MSVC 支持 源字符集中的这些控制字符。)不生成诊断可以吗?

演示:

# GCC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
t999.c:1:12: warning: null character(s) preserved in literal
    1 | char x[] = "x x"; int s = sizeof x;
      |            ^
s:
        .long   4
# here we see that a diagnostic is produced, sizeof x is 4

# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
s:
        .long   4
# here we see that no diagnostic is produced, sizeof x is 4

# MSVC
# test \x00
# see below

# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s       DD      04H
# here we see that no diagnostic is produced, sizeof x is 4
  1. C11,5.1.1.2 翻译阶段,1:

物理源文件多字节字符以实​​现定义的方式映射 方式,到源字符集(引入换行符 行尾指示器)(如有必要)。

一个简单的问题:“映射到无”仍然是映射吗?例如 X => <无>。或者也许这不是“映射”,而是“跳过”(或“删除”)?示例:在 "xy" 中(二进制 22 78 00 79 22)MSVC 跳过/删除空字符,不产生诊断(使得 sizeof 产生 3 而不是 4)。可以吗?

演示:

# MSVC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s       DD      03H
# here we see that no diagnostic is produced, sizeof x is 3

The questions below are about Character sets (C11, 5.2.1 Character sets) and mapping (C11, 5.1.1.2 Translation phases, 1).

The list:

  1. Can a source character set as an extension include control characters, representing other than horizontal tab, vertical tab, and form feed? If yes, then does a diagnostic need to be produced when using such control characters in e.g. string literal?

    Example: GCC/LLVM/MSVC support many control characters in a string
    literal w/o issuing a diagnostic AND they keep such control
    characters in the string literal after the mapping at the
    translation phase 1 is done. (Meaning that GCC/LLVM/MSVC support
    these control characters in the source character set.) Is it OK that diagnostic is not produced?

Demo:

# GCC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
t999.c:1:12: warning: null character(s) preserved in literal
    1 | char x[] = "x x"; int s = sizeof x;
      |            ^
s:
        .long   4
# here we see that a diagnostic is produced, sizeof x is 4

# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
gcc t999.c -c -std=c11 -pedantic -Wall -Wextra -S ;\
grep 's:' t999.S -A1
s:
        .long   4
# here we see that no diagnostic is produced, sizeof x is 4

# MSVC
# test \x00
# see below

# test \x01
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x01' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s       DD      04H
# here we see that no diagnostic is produced, sizeof x is 4
  1. C11, 5.1.1.2 Translation phases, 1:

Physical source file multibyte characters are mapped, in an implementation-defined
manner, to the source character set (introducing new-line characters for
end-of-line indicators) if necessary.

A simple question: is "mapping to nothing" still a mapping? E.g. X => <nothing>. Or perhaps it is not a "mapping", but "skipping" (or "removal")? Example: in "x<null>y" (in binary 22 78 00 79 22) MSVC skips/removes null character w/o producing a diagnostic (making sizeof produce 3 instead of 4). Is it OK?

Demo:

# MSVC
# test \x00
$ echo "char x[] = \"xxx\"; int s = sizeof x;" > t999.c ;\
printf '\x00' | dd of=t999.c bs=1 seek=13 count=1 conv=notrunc ;\
cl t999.c /c /std:c11 /FA /nologo ;\
grep -P '^s' t999.asm
s       DD      03H
# here we see that no diagnostic is produced, sizeof x is 3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文