C 字符串的默认编码是什么?

发布于 2024-09-28 11:18:31 字数 256 浏览 4 评论 0原文

我知道 C 字符串是 char[],最后一个元素中有 '\0'。但是字符是如何编码的呢?

更新:我发现这个很酷的链接讨论了许多其他编程语言及其编码约定:链接

I know that C strings are char[] with a '\0' in the last element. But how are the chars encoded?

Update: I found this cool link which talks about many other programming languages and their encoding conventions: Link

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

冷情 2024-10-05 11:18:31

关于此事的所有标准规定,您至少获得 52 个大写和小写拉丁字母字符、数字 0 到 9、符号 ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~ 和空格字符,以及代表水平制表符、垂直制表符的控制字符、换页、警报、退格、回车和换行,

它对数字编码的唯一说明是上述所有内容都适合一个字节,并且零后的每个数字的值比 的值大 1。实际

的编码可能是从您的区域设置继承的。

All the standard says on the matter is that you get at least the 52 upper- and lower-case latin alphabet characters, the digits 0 to 9, the symbols ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~, and the space character, and control characters representing horizontal tab, vertical tab, form feed, alert, backspace, carriage return, and new line.

The only thing it says about numeric encoding is that all of the above fits in one byte, and that the value of each digit after zero is one greater that the value of the previous one.

The actual encoding is probably inherited from your locale settings. Probably something ASCII-compatible.

冧九 2024-10-05 11:18:31

C 字符串几乎只是一个字节序列。这意味着它没有明确定义的编码,就这一点而言,它可以是 ASCII、UTF8 或其他任何编码。因为大多数操作系统默认理解 ASCII,并且源代码大多是用 ASCII 编码编写的,所以您在简单 (char*) 中找到的数据通常也是 ASCII。
尽管如此,并不能保证您从 (char*) 中得到的结果将是 UTF8 甚至 KOI8。

A c string is pretty much just a sequence of bytes. That means, that it does not have a well-defined encoding, it could be ASCII, UTF8 or anything else, for that matter. Because most operating systems understand ASCII by default, and source code is mostly written with ASCII encoding, so the data you will find in a simple (char*) will very often be ASCII as well.
Nonetheless, there is no guarantee that what you get out of a (char*) will be UTF8 or even KOI8.

岁月染过的梦 2024-10-05 11:18:31

标准没有对此进行规定。通常使用 ASCII。

The standard does not specify this. Typically with ASCII.

傲性难收 2024-10-05 11:18:31

正如其他人已经指出的那样,C 对源字符编码和执行字符编码有一些限制,但相对宽松。因此,它不一定是 ASCII,而且在当今的大多数情况下,至少是它的扩展。

您的执行环境旨在在源字符集和执行字符集之间进行最终转换。
因此,通常您不应该关心编码,相反,尝试独立于编码进行编码。这就是为什么有特殊字符(如 '\n''\t')的特殊转义序列以及通用字符编码(如 '\u0386')的原因>。因此,通常您不必自己查找执行字符集的编码。

As other indicated already, C has some restrictions what is permitted for source and execution character encodings, but is relatively permissive. So in particular it is not necessarily ASCII, and in most cases nowadays at least an extensions of that.

Your execution environment is meant to do an eventual translation between source and execution character set.
So generally you should not care about the encoding and in the contrary try to code independently of it. This why there are special escape sequences for special characters like '\n', or '\t' and universal character encodings like '\u0386'. So usually you shouldn't have to look up the encodings for the execution character set yourself.

沦落红尘 2024-10-05 11:18:31

它们并不是真正“编码”的,它们只是按原样存储。字符串“hello”表示具有 char 值 'h''e''l''l 的数组''o''\0',按此顺序。 C 标准有一个包含这些字符的基本字符集,但没有指定字节编码。据您所知,它可能是 EBCDIC。

They are not really "encoded" as such, they are simply stored as-is. The string "hello" represents an array with the char values 'h', 'e', 'l', 'l', 'o' and '\0', in that order. The C standard has a basic character set that includes these characters, but doesn't specify an encoding into bytes. It could be EBCDIC, for all you know.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文