为什么解释语言大多是鸭子类型,而编译语言则具有强类型?
我只是不知道,这有什么技术原因吗? 为弱类型语言实现编译器是否更困难? 它是什么?
I just don't know that, is there any technical reason for that? Is it more difficult to implement a compiler for a language with weak typing? What is it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
问题背后的前提有点狡猾。 解释语言大多是鸭子类型的说法是不正确的。 编译语言大多都具有强类型是不正确的。 类型系统是语言的一个属性。 编译与解释是实现的一个属性。
示例:
编程语言Scheme是动态类型的(又名鸭子类型),并且它有许多解释的实现,还包括一些优秀的本机代码编译器,包括 Larceny、Gambit 和 PLT Scheme(其中包括解释器和 JIT 编译器,可实现无缝转换)。
Haskell 编程语言是静态类型的; 两个最著名的实现是解释器 HUGS 和编译器 GHC。 还有其他几个值得尊敬的实现,它们在编译为本机代码 (yhc) 和解释 (Helium) 之间平均分配。
编程语言 Standard ML 是静态类型的,它有许多本机代码编译器,其中最好和最积极维护的编译器之一是 MLton,但最有用的实现之一是解释器莫斯科 ML
Objective Caml 编程语言是静态类型的。 它仅附带一种实现(来自法国的 INRIA),但该实现包括解释器和本机代码编译器。
编程语言 Pascal 是静态类型的,但由于 UCSD 构建的基于 P 代码解释器的出色实现,它在 20 世纪 70 年代开始流行。 在后来的几年里,出现了优秀的本机代码编译器,例如用于 370 系列计算机的 IBM Pascal/VS 编译器。
编程语言 C 是静态类型的,今天几乎所有实现都是编译的,但在 20 世纪 80 年代,我们这些有幸使用 Sabre C 的人使用的是解释器。
尽管如此,您的问题背后有一些道理,因此您应该得到更周到的答案。 事实是,动态类型语言似乎确实与解释实现相关。 为什么会这样?
许多新语言都是由实现定义的。 构建解释器比构建编译器更容易。 动态检查类型比静态检查更容易。 如果您正在编写解释器,则静态类型检查几乎没有性能优势。
除非您正在创建或改编非常灵活的多态类型系统,否则静态类型系统可能会妨碍程序员。 但是,如果您正在编写一个解释器,一个原因可能是创建一个小型、轻量级的实现,不妨碍程序员。
在某些解释性语言中,许多基本操作的开销非常大,因此在运行时检查类型的额外开销并不重要。PostScript 就是一个很好的例子:如果您要运行并立即栅格化贝塞尔曲线,您将毫不犹豫地在这里或那里检查类型标记。
顺便说一句,请警惕术语“强”和“弱”类型,因为它们没有普遍认可的技术含义。 相比之下,静态类型意味着程序在执行之前会被检查,并且程序可能在启动之前被拒绝。 动态类型意味着值的类型在执行期间被检查,并且类型错误的操作可能会导致程序停止或以其他方式在运行时发出错误信号。 静态类型的主要原因是排除可能存在此类“动态类型错误”的程序。 (这是编写解释器的人通常对静态类型不太感兴趣的另一个原因;执行在类型检查后立即发生,因此保证的区别和性质并不那么明显。)
通常强类型意味着类型系统没有漏洞,而弱类型意味着类型系统可以被破坏(使任何保证无效)。 这些术语经常被错误地用来表示静态类型和动态类型。
要了解其中的差异,请考虑 C:该语言在编译时进行类型检查(静态类型),但存在大量漏洞; 您几乎可以将任何类型的值转换为相同大小的另一种类型——特别是,您可以自由地转换指针类型。 Pascal 是一种旨在强类型的语言,但众所周知,它有一个不可预见的漏洞:没有标签的变体记录。
随着时间的推移,强类型语言的实现经常会出现漏洞,通常使得运行时系统的一部分可以用高级语言来实现。 例如,Objective Caml 有一个名为
Obj.magic
的函数,它具有简单返回其参数的运行时效果,但在编译时它将任何类型的值转换为任何其他类型之一。 我最喜欢的例子是 Modula-3,其设计者将其类型转换构造称为“LOOPHOLE”。总之:
静态与动态是语言。
编译与解释是实现。
原则上这两个选择可以并且是正交的,但出于合理的技术原因动态类型经常与解释相关。
The premises behind the question are a bit dodgy. It is not true that interpreted languages are mostly ducktyped. It is not true that compiled languages mostly have strong typing. The type system is a property of a language. Compiled versus interpreted is a property of an implementation.
Examples:
The programming language Scheme is dynamically typed (aka duck-typed), and it has many dozens of interpreted implementations, but also some fine native-code compilers including Larceny, Gambit, and PLT Scheme (which includes both an interpreter and a JIT compiler making seamless transitions).
The programming language Haskell is statically typed; the two most famous implementations are the interpreter HUGS and the compiler GHC. There are several other honorable implementations split about evenly between compiling to native code (yhc) and interpretation (Helium).
The programming language Standard ML is statically typed, and it has had many native-code compilers, of which one of the best and most actively maintained is MLton, but one of the most useful implementations is the interpreter Moscow ML
The programming language Objective Caml is statically typed. It comes with only one implementation (from INRIA in France) but this implementation includes both an interpreter and a native-code compiler.
The programming language Pascal is statically typed, but it became popular in the 1970s because of the excellent implementation built at UCSD, which was based on a P-code interpreter. In later years fine native-code compilers became available, such as the IBM Pascal/VS compiler for the 370 series of computers.
The programming language C is statically typed, and today almost all implementations are compiled, but in the 1980s those of us lucky enough to be using Saber C were using an interpreter.
Nevertheless there is some truth behind your question, so you deserve a more thoughtful answer. The truth is that dynamically typed languages do seem to be correlated with interpreted implementations. Why might that be?
Many new languages are defined by an implementation. It is easier to build an interpreter than to build a compiler. It is easier to check types dynamically than to check them statically. And if you are writing an interpreter, there is little performance benefit to static type-checking.
Unless you are creating or adapting a very flexible polymorphic type system, a static type system is likely to get in the programmer's way. But if you are writing an interpreter, one reason may be to create a small, lightweight implementation that stays out of the programmer's way.
In some interpreted languages, many fundamental operations are so expensive that the additional overhead of checking types at run time doesn't matter. A good example is PostScript: if you're going to run off and rasterize Bezier curves at the drop of a hat, you won't balk at checking a type tag here or there.
Incidentally, please be wary of the terms "strong" and "weak" typing, because they don't have a universally agreed technical meaning. By contrast, static typing means that programs are checked before being executed, and a program might be rejected before it starts. Dynamic typing means that the types of values are checked during execution, and a poorly typed operation might cause the program to halt or otherwise signal an error at run time. A primary reason for static typing is to rule out programs that might have such "dynamic type errors". (This is another reason people who write interpreters are often less interested in static typing; execution happens immediately after type checking, so the distinction and the nature of the guarantee aren't as obvious.)
Strong typing generally means that there are no loopholes in the type system, whereas weak typing means the type system can be subverted (invalidating any guarantees). The terms are often used incorrectly to mean static and dynamic typing.
To see the difference, think of C: the language is type-checked at compile time (static typing), but there are plenty of loopholes; you can pretty much cast a value of any type to another type of the same size---in particular, you can cast pointer types freely. Pascal was a language that was intended to be strongly typed but famously had an unforeseen loophole: a variant record with no tag.
Implementations of strongly typed languages often acquire loopholes over time, usually so that part of the run-time system can be implemented in the high-level language. For example, Objective Caml has a function called
Obj.magic
which has the run-time effect of simply returning its argument, but at compile time it converts a value of any type to one of any other type. My favorite example is Modula-3, whose designers called their type-casting constructLOOPHOLE
.In summary:
Static vs dynamic is the language.
Compiled vs interpreted is the implementation.
In principle the two choices can be and are made orthogonally, but for sound technical reasons dynamic typing frequently correlates with interpretation.
进行早期绑定(强类型)的原因是性能。 通过早期绑定,您可以在编译时找到方法的位置,以便在运行时它已经知道它所在的位置。
然而,对于后期绑定,您必须去寻找一个看起来像客户端代码调用的方法的方法。 当然,程序中的方法调用有很多很多,这就是动态语言变得“慢”的原因。
但当然,您可以创建一种进行后期绑定的静态编译语言,这会抵消静态编译的许多优点。
The reason that you do early binding (strong typing) is performance. With early binding, you find the location of the method at compile time, so that at run time it already knows where it lives.
However, with late binding, you have to go searching for a method that seems like the method that the client code called. And of course, with many, many method calls in a program, that's what makes dynamic languages 'slow'.
But sure, you could create a statically compiled language that does late binding, which would negate many of the advantages of static compilation.
这很大程度上是因为编写和使用解释语言的人往往更喜欢鸭子类型,而开发和使用编译语言的人更喜欢强显式类型。 (我认为对此达成共识的原因是 90% 是为了防止错误,10% 是为了性能。)对于今天编写的大多数程序来说,速度差异是微不足道的。 Microsoft Word 已经在 p 代码(未编译)上运行了 - 什么 - 15 年了?
我能想到的最好的例子。 经典 Visual Basic(VB6/VBA/等) 可以用 VB 编写相同的程序,并以相同的结果和可比较的速度运行,无论是编译还是解释。 此外,您可以选择是否进行类型声明(实际上是变量声明)。 大多数人更喜欢类型声明,通常是为了防止错误。 我从未在任何地方听说过或读到过使用类型声明来提高速度。 这至少可以追溯到硬件速度和容量的两倍数量级。
Google 最近受到了很多关注,因为他们在 JavaScript 的 JIT 编译器上所做的工作 - 这不需要对语言进行任何更改,也不需要程序员进行任何额外的考虑。 在这种情况下,唯一的好处就是速度。
It's pretty much because people who write and use interpreted languages tend to prefer ducktyping, and people who develop and use compiled languages prefer strong explicit typing. (I think the concensus reason for this would be somewhere in the area of 90% for error prevention, and 10% for performance.) For most programs written today, the speed difference would be insignificant. Microsoft Word has run on p-code (uncompiled) for - what - 15 years now?
The best case in point I can think of. Classical Visual Basic (VB6/VBA/etc.) The same program could be written in VB and run with identical results and comparable speed either compiled or interpreted. FUrthermore, you have the option of type declarations (in fact variable declarations) or not. Most people preferred type declarations, usually for error prevention. I've never heard or read anywhere to use type declarations for speed. And this goes back at least as far as two powers of magnitude in hardware speed and capacity.
Google is getting a lot of attention lately because of their work on a JIT compiler for javascript - which will require no changes to the language, or require any extra consideration on the part of the programmer. In this case, the only benefit will be speed.
因为编译型语言在编译时需要考虑所使用的内存量。
当您看到类似以下内容时:
在 C++ 中,编译器会输出保留四个字节内存的代码,然后分配局部符号“a”以指向该内存。 如果您有像 javascript 这样的无类型脚本语言,解释器会在幕后分配所需的内存。 你可以这样做:
在这两条线之间发生了很多事情。 解释器删除 a 处的内存,为字符分配新的缓冲区,然后分配 a var 指向该新内存。 在强类型语言中,没有解释器为您管理它,因此编译器必须编写考虑类型的指令。
因此编译器会停止编译该代码,这样 CPU 就不会盲目地将 12 字节写入 4 字节缓冲区并造成痛苦。
编译器编写额外指令来处理类型所增加的开销会显着减慢语言速度,并消除 C++ 等语言的优势。
:)
-nelson
编辑回应评论
我对Python了解不多,所以我不能对此说太多。 但松散类型会大大减慢运行时间。 解释器 (VM) 调用的每条指令都必须进行评估,并在必要时将 var 强制转换为预期类型。 如果:
那么解释器必须确保 a 是变量和数字,然后在处理指令之前必须将 b 强制转换为数字。 再加上虚拟机执行的每条指令的开销,你的手上就会一团糟:)
Because compiled languages need to take the amount of memory used in account when they are compiled.
When you see something like:
in C++, the compiler spits out code that reserves four bytes of memory and then assigns the local symbol "a" to point to that memory. If you had a typeless scripting language like javascript, the interpreter, behind the scenes, allocates the memory required. You can do:
There is a lot that happens between those two lines. The interpreter deletes the memory at a, allocates the new buffer for the chars, then assigns the a var to point to that new memory. In a strongly typed language, there is no interpreter that manages that for you and thus the compiler must write instructions that take into account type.
So the compilers stops that code from compiling so the CPU dosn't blindly write 12 bytes into a four byte buffer and cause misery.
The added overhead for a compiler writing extra instructions to take care of type would slow down the language significantly and remove the benefit of languages like C++.
:)
-nelson
EDIT in response to comment
I don't know much about Python, so I can't say much about that. But loosely typedness slows down runtime considerably. Each instruction that the interpreter (VM) calls has to evaulate, and if necessary, coerce the var into the expected type. If you have:
Then the interpreter has to make sure that a is a variable and a number, then it would have to coerce b into a number before processing the instruction. Add that overhead for every instruction that the VM executes and you have a mess on your hands :)
我猜测具有动态(鸭子)类型的语言采用惰性求值,这受到惰性程序员的青睐,而惰性程序员不喜欢编写编译器;-)
I'm guessing that languages with dynamic (duck) typing employ lazy evaluation, which is favored by lazy programmers, and lazy programmers don't like to write compilers ;-)
使用静态类型而不是鸭子类型基本上有两个原因:
如果您使用解释型语言,则无需编译时间即可进行静态错误检查。 有一个优势。 此外,如果您已经有解释器的开销,那么该语言已经不会用于任何性能关键的事情,因此性能参数变得无关紧要。 这解释了为什么静态类型解释语言很少见。
另一方面,鸭子类型可以在很大程度上用静态类型语言进行模拟,而无需完全放弃静态类型的好处。 这可以通过以下任一方式完成:
这解释了为什么动态类型编译语言很少。
There are basically two reasons to use static typing over duck typing:
If you have an interpreted language, then there's no compile time for static error checking to take place. There goes one advantage. Furthermore, if you already have the overhead of the interpreter, then the language is already not going to be used for anything performance critical, so the performance argument becomes irrelevant. This explains why statically typed interpreted languages are rare.
Going the other way, duck typing can be emulated to a large degree in statically typed languages, without totally giving up the benefits of static typing. This can be done via any of the following:
This explains why there are few dynamically typed, compiled languages.
弱类型的语言是可以编译的,例如 Perl5 和大多数版本的 Lisp 都是编译语言。 然而,编译的性能优势通常会丢失,因为语言运行时必须执行的大部分工作都是确定动态变量在特定时间的实际类型。
以 Perl 中的以下代码为例:
编译器显然很难确定 $x 在给定时间点的实际类型。 在打印语句时,需要做一些工作来弄清楚这一点。 在静态类型语言中,类型是完全已知的,因此可以提高运行时的性能。
Languages with weak typing can be compiled, for example, Perl5 and most versions of Lisp are compiled languages. However, the performance benefits of compiling are often lost because much of the work that the language runtime has to perform is to do with determining what type a dynamic variable really has at a particular time.
Take for example the following code in Perl:
It is obviously pretty difficult for the compiler to determine what type $x really has at a given point in time. At the time of the print statement, work needs to be done to figure that out. In a statically typed language, the type is fully known so performance at runtime can be increased.
猜测:
在编译语言中,一个系统(编译器)可以看到执行强类型所需的所有代码。 解释器通常一次只能看到程序的一小部分,因此无法进行这种交叉检查。
但这并不是一个硬性规定 - 很有可能创建一种强类型的解释语言,但这将违背解释语言那种“松散”的总体感觉。
A guess:
In a compiled language, one system (the compiler) gets to see all the code required to do strong typing. Interpreters generally only see a tiny bit of the program at a time, and so can't do that sort of cross checking.
But this isn't a hard and fast rule - it would be quite possible to make a strongly typed interpreted language, but that would go against the sort of "loose" general feel of interpreted languages.
有些语言本应在非异常条件下完美运行,但在异常条件下却会遇到糟糕的性能,因此需要非常强的类型。 其他的只是为了通过额外的处理来平衡它。
有时,游戏的意义远不止打字那么简单。 以 ActionScript 为例。 3.0 引入了更强的类型,但 ECMAScript 再次允许您在运行时根据需要修改类,并且 ActionScript 支持动态类。 非常简洁,但事实上他们指出动态类不应在“标准”构建中使用,这意味着当您需要安全时这是一个禁忌。
Some languages are meant to run perfect in non-exceptional conditions and that's sacrificed for horrible performance they run into during exceptional conditions, hence very strong typing. Others were just meant to balance it with additional processing.
At times, there's way more in play than just typing. Take ActionScript for instance. 3.0 introduced stronger typing, but then again ECMAScript enables you to modify classes as you see fit at runtime and ActionScript has support for dynamic classes. Very neat, but the fact they're stating that dynamic classes should not be used in "standard" builds means it's a no-no for when you need to play it safe.