必须了解机器架构才能编写代码吗?
假设我正在用 Java 或 Python 或 C++ 编程解决一个简单的问题,可能是构建 TCP/UDP 回显服务器或阶乘计算。 我是否需要关心架构细节,即它是 32 位还是 64 位?
恕我直言,除非我正在编程一些与相当低级的东西有关的东西,否则我不必担心它是 32 位还是 64 位。 我哪里出错了? 还是我说得对???
Let's say I'm programming in Java or Python or C++ for a simple problem, could be to build an TCP/UDP echo server or computation of factorial. Do I've to bother about the architecture details, i.e., if it is 32 or 64-bit?
IMHO, unless I'm programming something to do with fairly low-level stuff then I don't have to bother if its 32 or 64 bit. Where am I going wrong? Or am I correct???
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
在大多数情况下正确
运行时/语言/编译器将抽象这些细节,除非您直接处理低级别的字长或二进制文件。
甚至字节顺序也是由内核中的 NIC/网络堆栈抽象的。 已为您翻译。 当用 C 语言编写套接字时,有时在发送数据时必须处理网络的字节顺序……但这并不涉及 32 位或 64 位差异。
在处理二进制数据 blob 时,将它们从一种体系结构映射到另一种体系结构(例如作为 C 结构的覆盖)可能会导致其他人提到的问题,但这就是我们开发基于字符等的体系结构独立协议的原因。
事实上,像 Java 这样的东西在虚拟机中运行,这又进一步抽象了机器!
了解一点架构的指令集,以及语法是如何编译的,可以帮助您了解平台并编写更清晰、更紧凑的代码。 我知道在学习编译器之后我会对一些旧的 C 代码做鬼脸!
correct for most circumstances
The runtime/language/compiler will abstract those details unless you are dealing directly with word sizes or binary at a low level.
Even byteorder is abstracted by the NIC/Network stack in the kernel. It is translated for you. When programming sockets in C, you do sometimes have to deal with byte ordering for the network when sending data ... but that doesn't concern 32 or 64 bit differences.
When dealing with blobs of binary data, mapping them from one architecture to another (as an overlay to a C struct for example) can cause problems as others have mentioned, but this is why we develop architecture independent protocols based on characters and so on.
In-fact things like Java run in a virtual machine that abstracts the machine another step!
Knowing a bit about the instruction set of the architecture, and how the syntax is compiled to that can help you understand the platform and write cleaner, tighter code. I know I grimace at some old C code after studying compilers!
了解事物如何工作,无论是虚拟机如何工作,以及它如何在您的平台上工作,或者某些 C++ 结构如何转换为汇编,总是会让您成为更好的程序员,因为您会理解为什么应该按照它们的方式完成事情是。
您需要了解内存等知识,才能了解什么是缓存未命中以及为什么它们可能会影响您的程序。 您应该知道某些事情是如何实现的,即使您可能只使用接口或高级方式来实现它,但了解它的工作原理将确保您以最佳方式进行操作。
对于数据包工作,您需要了解数据如何存储在平台上,以及如何通过网络将数据发送到不同的平台可能会改变数据的读取方式(字节顺序)。
您的编译器将充分利用您正在编译的平台,因此只要您严格遵守标准和代码,您就可以忽略大多数事情并假设编译器会找出最好的东西。
简而言之,不。 您不需要了解低级内容,但了解总没有坏处。
Knowing how things work, be it how the virtual machine works, and how it works on your platform, or how certain C++ constructs are transformed into assembly will always make you a better programmer, because you will understand why things should be done the way they are.
You need to understand things like memory to know what cache-misses are and why those might affect your program. You should know how certain things are implemented, even though you might only use an interface or high-level way to get to it, knowing how it works will make sure you're doing it in the best way.
For packet work, you need to understand how data is stored on platforms and how sending that across the network to a different platform might change how the data is read (endian-ness).
Your compiler will make best use of the platform you're compiling on, so as long as you stick to standards and code well, you can ignore most things and assume the compiler will whip out what's best.
So in short, no. You don't need to know the low level stuff, but it never hurts to know.
上次我查看 Java 语言规范时,它在整数装箱部分包含一个可笑的陷阱。
这保证打印
true
。不保证打印
true
。 这取决于运行时间。 该规范将其完全开放。 这是因为对 -128 和 127 之间的 int 进行装箱会返回“interned”对象(类似于字符串文字的 interned 方式),但鼓励语言运行时的实现者根据需要提高该限制。我个人认为这是一个疯狂的决定,我希望他们从那时起就解决了这个问题(编写一次,在任何地方运行?)
The last time I looked at the Java language spec, it contained a ridiculous gotcha in the section on integer boxing.
That is guaranteed to print
true
.That is not guaranteed to print
true
. It depends on the runtime. The spec left it completely open. It's because boxing an int between -128 and 127 returns "interned" objects (analogous to the way string literals are interned), but the implementer of the language runtime is encouraged to raise that limit if they wish.I personally regard that as an insane decision, and I hope they've fixed it since (write once, run anywhere?)
有时你必须费心。
当这些低级细节突然跳出来并咬住你时,你可能会感到惊讶。 例如,Java 将
double
标准化为 64 位。 然而,Linux JVM 使用“扩展精度”模式,此时只要 CPU 寄存器中的双精度数就是 80 位。 这意味着下面的代码可能会失败:仅仅因为 y 被强制从寄存器移入内存并从 80 位截断为 64 位。
You sometimes must bother.
You can be surprised when these low-level details suddenly jump out and bite you. For example, Java standardized
double
to be 64 bit. However, Linux JVM uses the "extended precision" mode, when the double is 80 bit as long as it's in the CPU register. This means that the following code may fail:Simply because y is forced out of the register into memory and truncated from 80 to 64 bits.
在 Java 和 Python 中,架构细节被抽象出来,因此实际上或多或少不可能编写依赖于架构的代码。
对于 C++,这是一个完全不同的问题 - 您当然可以编写不依赖于体系结构细节的代码,但是您必须小心避免陷阱,特别是涉及依赖于体系结构的基本数据类型,例如 int 。
In Java and Python, architecture details are abstracted away so that it is in fact more or less impossible to write architecture-dependant code.
With C++, this is an entirely different matter - you can certainly write code that does not depend on architecture details, but you have be careful to avoid pitfalls, specifically concerning basic data types that are are architecture-dependant, such as
int
.只要你做事正确,你几乎不需要知道大多数语言的情况。 在许多方面,您永远不需要知道,因为语言行为不会变化(例如,Java 精确指定运行时行为)。
在 C++ 和 C 中,正确行事包括不对 int 做出假设。 不要将指针放入 int 中,当您对内存大小或地址执行任何操作时,请使用 size_t 和 ptrdiff_t。 不要指望数据类型的大小:int 必须至少为 16 位,几乎总是 32 位,在某些体系结构上可能为 64 位。 不要假设浮点运算在不同的机器上以完全相同的方式完成(IEEE 标准有一些余地)。
几乎所有支持网络的操作系统都会为您提供一些方法来处理可能的字节顺序问题。 使用它们。 使用像 isalpha() 这样的语言工具来对字符进行分类,而不是对字符进行算术运算(这可能有点奇怪,比如 EBCDIC)。 (当然,现在更常见的是使用 wchar_t 作为字符类型,并在内部使用 Unicode。)
As long as you do things correctly, you almost never need to know for most languages. On many, you never need to know, as the language behavior doesn't vary (Java, for example, specifies the runtime behavior precisely).
In C++ and C, doing things correctly includes not making assumptions about int. Don't put pointers in int, and when you're doing anything with memory sizes or addresses use size_t and ptrdiff_t. Don't count on the size of data types: int must be at least 16 bits, almost always is 32, and may be 64 on some architectures. Don't assume that floating-point arithmetic will be done in exactly the same way on different machines (the IEEE standards have some leeway in them).
Pretty much all OSes that support networking will give you some way to deal with possible endianness problems. Use them. Use language facilities like isalpha() to classify characters, rather than arithmetic operations on characters (which might be something weird like EBCDIC). (Of course, it's now more usual to use wchar_t as character type, and use Unicode internally.)
如果您使用Python或Java编程,解释器和虚拟机分别抽象这一层架构。 然后您无需担心它是在 32 位还是 64 位架构上运行。
对于 C++ 则不然,有时您必须问自己是在 32 位机器上运行还是在 64 位机器上运行
If you are programming in Python or in Java, the interpreter and the virtual machine respectively abstract this layer of the architecture. You then need not to worry if it's running on a 32 or 64 bits architecture.
The same cannot be said for C++, in which you'll have to ask yourself sometimes if you are running on a 32 or 64 bits machine
仅当您发送和接收原始 C 结构时,您才需要关心“字节顺序”
然而
,这不是推荐的做法。
建议您在各方之间定义一个协议,这样就无关紧要了
双方的机器架构。
You will need to care about "endian-ness" only if you send and receive raw C structs
over the wire like
However this is not a recommended practice.
It's recommended that you define a protocol between the parties such it doesn't matter
the parties' machine architectures.
在 C++ 中,如果您想编写在 32 位或 64 位上正常工作的代码,则必须非常小心。
例如,许多人错误地认为
int
可以存储指针。In C++, you have to be very careful if you want to write code that works indifferently on 32 or 64 bits.
Many people wrongly assume that
int
can store a pointer, for example.使用 java 和 .net,您实际上不必费心,除非您正在做非常低级的事情,例如摆弄位。 如果您使用的是 c、c++、fortran,您可能会过得去,但我实际上建议使用“stdint.h”之类的东西,其中您使用 uint64_t 和 uint32_t 等明确声明,以便明确。 此外,您还需要根据链接方式使用特定的库进行构建,例如 64 位系统可能在默认的 64 位编译模式下使用 gcc。
With java and .net you don't really have to bother with it unless you are doing very low level stuff like twiddling bits. If you are using c, c++, fortran you might get by but I would actually recommend using things like "stdint.h" where you use definitive declares like uint64_t and uint32_t so as to be explicit. Also, you will need to build with particularly libraries depending on how you are linking, for example a 64 bit system might use gcc in a default 64 bit compile mode.
32 位机器最多允许您拥有 4 GB 的可寻址虚拟内存。 (实际上,它甚至比这个还要小,通常是 2 GB 或 3 GB,具体取决于操作系统和各种链接器选项。)在 64 位计算机上,您可以拥有巨大的虚拟地址空间(在任何实际意义上,仅受磁盘限制) )和一个相当大的内存。
因此,如果您希望使用 6GB 数据集进行某些计算(假设需要不连贯的访问,并且不能一次只传输一点数据),那么在 64 位架构上,您可以将其读入 RAM 并执行您的操作,而在 32 位架构上,您需要一种完全不同的方法来处理它,因为您根本无法选择保留整个数据集。
A 32 bit machine will allow you to have a maximum of 4 GB of addressable virtual memory. (In practice, it's even less than that, usually 2 GB or 3 GB depending on the OS and various linker options.) On a 64 bit machine, you can have a HUGE virtual address space (in any practical sense, limited only by disk) and a pretty damn big RAM.
So if you are expecting 6GB data sets for some computation (let's say something that needs incoherent access and can't just be streamed a bit at a time), on a 64 bit architecture you could just read it into RAM and do your stuff, whereas on a 32 bit architecture you need a fundamentally different way to approach it, since you simply do not have the option of keeping the entire data set resident.