C 的哪些部分最可移植?
我最近读到了 Lua 联合创始人 Luiz H. de Figueredo 和 Roberto Ierusalimschy 的采访,他们讨论了 Lua 的设计和实现。至少可以说这是非常有趣的。然而,讨论的一部分让我想起了一些事情。 Roberto 将 Lua 称为“独立应用程序”(也就是说,它是纯 ANSI C,不使用操作系统中的任何内容。)他说,Lua 的核心是完全可移植的,并且因为它< em>purity 已经能够更容易地移植到从未考虑过的平台(例如机器人和嵌入式设备)。
现在这让我感到好奇。一般来说,C 是一种可移植性很强的语言。那么,C 的哪些部分(即标准库中的部分)最难移植?哪些内容可以在大多数平台上运行?是否应该只使用一组有限的数据类型(例如避免short
,也许float
)? FILE
和 stdio
系统怎么样? malloc
和 free
?看来Lua 避免了所有这些。这是把事情推向极端吗?或者它们是可移植性问题的根源?除此之外,还可以做哪些其他事情来使代码具有极高的可移植性?
我问所有这些问题的原因是因为我目前正在用纯 C89 编写一个应用程序,并且它尽可能地可移植是最佳的。我愿意在实现它时采取中间路线(足够便携,但又不至于我必须从头开始编写所有内容。)无论如何,我只是想看看编写最好的 C 代码的关键是什么。
最后一点,所有这些讨论仅与 C89 有关。
I recently read an interview with Lua co-creators Luiz H. de Figueredo and Roberto Ierusalimschy, where they discussed the design, and implementation of Lua. It was very intriguing to say the least. However, one part of the discussion brought something up in my mind. Roberto spoke of Lua as a "freestanding application" (that is, it's pure ANSI C that uses nothing from the OS.) He said, that the core of Lua was completely portable, and because of its purity has been able to be ported much more easily and to platforms never even considered (such as robots, and embedded devices.)
Now this makes me wonder. C in general is a very portable language. So, what parts of C (namely those in the the standard library) are the most unportable? and what are those that can be expected to work on most platforms? Should only a limited set of data types be used (e.g. avoiding short
and maybe float
)? What about the FILE
and the stdio
system? malloc
and free
? It seems that Lua avoids all of these. Is that taking things to the extreme? Or are they the root of portability issues? Beyond this, what other things can be done to make code extremely portable?
The reason I'm asking all of this, is because I'm currently writing an application in pure C89, and it's optimal that it be as portable as possible. I'm willing take a middle road in implementing it (portable enough, but no so much that I have to write everything from scratch.) Anyways, I just wanted to see what in general is key to writing the best C code.
As a final note, all of this discussion is related to C89 only.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
就 Lua 而言,我们对 C 语言本身没有太多可抱怨的,但我们发现 C 标准库包含许多看似无害且易于使用的函数,直到您认为它们没有检查它们的功能。输入有效性(如果不方便也可以)。 C 标准规定,处理错误输入是未定义的行为,允许这些函数做任何他们想做的事情,甚至使主机程序崩溃。例如,考虑 strftime。一些 libc 只是忽略无效的格式说明符,但其他 libc(例如,在 Windows 中)会崩溃!现在,strftime 并不是一个关键函数。为什么要崩溃而不是做一些明智的事情?因此,Lua 必须在调用 strftime 之前对输入进行自己的验证,并将 strftime 导出到 Lua 程序变得很麻烦。因此,我们试图通过核心的独立性来避免 Lua 核心中的这些问题。但 Lua 标准库无法做到这一点,因为它们的目标是将功能导出到 Lua 程序,包括 C 标准库中可用的功能。
In the case of Lua, we don't have much to complain about the C language itself but we have found that the C standard library contains many functions that seem harmless and straight-forward to use, until you consider that they do not check their input for validity (which is fine if inconveninent). The C standard says that handling bad input is undefined behavior, allowing those functions to do whatever they want, even crash the host program. Consider, for instance, strftime. Some libc's simply ignore invalid format specifiers but other libc's (e.g., in Windows) crash! Now, strftime is not a crucial function. Why crash instead of doing something sensible? So, Lua has to do its own validation of input before calling strftime and exporting strftime to Lua programs becomes a chore. Hence, we have tried to stay clear from these problems in the Lua core by aiming at freestanding for the core. But the Lua standard libraries cannot do that, because their goal is to export facilities to Lua programs, including what is available in the C standard library.
“独立”在 C 语言中具有特殊含义。粗略地说,独立主机不需要提供任何标准库,包括库函数
malloc
/free
,printf
等。某些标准标头仍然是必需的,但它们仅定义类型和宏(例如stddef.h
)。"Freestanding" has a particular meaning in the context of C. Roughly, freestanding hosts are not required to provide any of the standard libraries, including the library functions
malloc
/free
,printf
, etc. Certain standard headers are still required, but they only define types and macros (for examplestddef.h
).C89 允许两种类型的编译器:托管 和独立。基本区别在于托管编译器提供所有 C89 库,而独立编译器只需要提供
、
、
和
。如果您将自己限制在这些标头中,您的代码将可以移植到任何 C89 编译器。C89 allows two types of compilers: hosted and freestanding. The basic difference is that a hosted compiler provides all of the C89 library, while a freestanding compiler need only provide
<float.h>
,<limits.h>
,<stdarg.h>
, and<stddef.h>
. If you limit yourself to these headers, your code will be portable to any C89 compiler.这是一个非常广泛的问题。我不会给出明确的答案,而是提出一些问题。
请注意,C 标准将某些事物指定为“实现定义的”;符合标准的程序将始终在任何符合标准的平台上编译和运行,但根据平台的不同,其行为可能有所不同。具体来说,有
sizeof(long)
在一个平台上可能是四个字节,在另一个平台上可能是八个字节。short
、int
、long
等的大小各有一些最小值(通常彼此相对),但除此之外没有任何保证。0
赋给b
,-1
代码>在另一个上。\0
始终是空字节,但其他字符的显示方式取决于操作系统和其他因素。putchar('\n')
可能在一个平台上生成换行符,在下一个平台上生成回车符,在另一个平台上生成两者的组合。char
可能会也可能不会呈现负值。各种字长和字节顺序很常见。任何文本处理应用程序中都可能出现字符编码问题。 9 位字节的机器最有可能在博物馆中找到。这绝不是一份详尽的清单。
(请不要编写 C89,这是一个过时的标准。C99 添加了一些非常有用的可移植性内容,例如固定宽度整数
int32_t
等)This is a very broad question. I'm not going to give the definite answer, instead I'll raise some issues.
Note that the C standard specifies certain things as "implementation-defined"; a conforming program will always compile on and run on any conforming platform, but it may behave differently depending on the platform. Specifically, there's
sizeof(long)
may be four bytes on one platform, eight on another. The sizes ofshort
,int
,long
etc. each have some minimum (often relative to each other), but otherwise there are no guarantees.int a = 0xff00; int b = ((char *)&a)[0];
may assign0
tob
on one platform,-1
on another.\0
is always the null byte, but how the other characters show up depends on the OS and other factors.putchar('\n')
may produce a line-feed character on one platform, a carriage return on the next, and a combination of each on yet another.char
to take on negative values.Various word sizes and endiannesses are common. Character encoding issues are likely to come up in any text-processing application. Machines with 9-bit bytes are most likely to be found in museums. This is by no means an exhaustive list.
(And please don't write C89, that's an outdated standard. C99 added some pretty useful stuff for portability, such as the fixed-width integers
int32_t
etc.)C 的设计是为了让编译器可以为任何平台生成代码,并将其编译的语言称为“C”。这种自由与 C 语言作为一种编写可在任何平台上使用的代码的语言背道而驰。
任何为 C 编写代码的人都必须决定(有意或默认)他们将支持的
int
大小;虽然可以编写适用于任何合法大小的 int 的 C 代码,但这需要相当大的努力,并且生成的代码通常比为特定整数大小设计的代码可读性差得多。例如,如果有一个uint32_t
类型的变量x
,并且希望将其与另一个y
相乘,计算结果 mod 4294967296,语句 x*=y; 将在int
为 32 位或更小,或者int
为 65 位或更大的平台上工作,但是如果int
为 33 到 64 位,则将调用未定义行为
;如果操作数被视为整数而不是包装 mod 的代数环的成员,则将调用乘积4294967296,将超过INT_MAX
。通过将其重写为x*=1u*y;
,可以使语句的工作独立于int
的大小,但这样做会使代码不太清晰,并且会意外地省略其中一次乘法的1u*
可能会造成灾难性的后果。根据目前的规则,如果代码仅在整数大小符合预期的机器上使用,那么 C 是相当可移植的。在
int
的大小与预期不符的机器上,代码不太可能是可移植的,除非它包含足够的类型强制来使大多数语言的类型规则变得无关。C was designed so that a compiler may be written to generate code for any platform and call the language it compiles, "C". Such freedom acts in opposition to C being a language for writing code that can be used on any platform.
Anyone writing code for C must decide (either deliberately or by default) what sizes of
int
they will support; while it is possible to write C code which will work with any legal size ofint
, it requires considerable effort and the resulting code will often be far less readable than code which is designed for a particular integer size. For example, if one has a variablex
of typeuint32_t
, and one wishes to multiply it by anothery
, computing the result mod 4294967296, the statementx*=y;
will work on platforms whereint
is 32 bits or smaller, or whereint
is 65 bits or larger, but will invokeUndefined Behavior
in cases whereint
is 33 to 64 bits, and the product, if the operands were regarded as whole numbers rather than members of an algebraic ring that wraps mod 4294967296, would exceedINT_MAX
. One could make the statement work independent of the size ofint
by rewriting it asx*=1u*y;
, but doing so makes the code less clear, and accidentally omitting the1u*
from one of the multiplications could be disastrous.Under the present rules, C is reasonably portable if code is only used on machines whose integer size matches expectations. On machines where the size of
int
does not match expectations, code is not likely to be portable unless it includes enough type coercions to render most of the language's typing rules irrelevant.C89 标准的任何内容都应该可移植到任何符合该标准的编译器。如果您坚持使用纯 C89,您应该能够相当轻松地移植它。任何可移植性问题都可能是由于编译器错误或代码调用特定于实现的行为的地方造成的。
Anything that is a part of the C89 standard should be portable to any compiler that conforms to that standard. If you stick to pure C89, you should be able to port it fairly easily. Any portability problems would then be due to compiler bugs or places where the code invokes implementation-specific behavior.