Lisp 和 Erlang Atoms、Ruby 和 Scheme 符号。它们有多有用?
在编程语言中拥有原子数据类型的功能有多有用?
一些编程语言具有原子或符号的概念来表示某种常量。我接触过的语言(Lisp、Ruby 和 Erlang)之间存在一些差异,但在我看来,总体概念是相同的。我对编程语言设计很感兴趣,我想知道原子类型在现实生活中能提供什么价值。其他语言,如 Python、Java、C# 似乎在没有它的情况下也表现得很好。
我没有 Lisp 或 Ruby 的实际经验(我知道语法,但没有在实际项目中使用过)。我已经足够使用 Erlang 来适应那里的概念了。
How useful is the feature of having an atom data type in a programming language?
A few programming languages have the concept of atom or symbol to represent a constant of sorts. There are a few differences among the languages I have come across (Lisp, Ruby and Erlang), but it seems to me that the general concept is the same. I am interested in programming language design, and I was wondering what value does having an atom type provide in real life. Other languages such as Python, Java, C# seem to be doing quite well without it.
I have no real experience of Lisp or Ruby (I know the syntaxes, but haven't used either in a real project). I have used Erlang enough to be used to the concept there.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
取自http://learnyousomeerlang.com/starting-out-for-real#原子
话虽如此,原子最终在语义上更适合在其他语言被迫使用字符串、枚举或定义的地方描述代码中的数据。它们更安全、更友好,可实现类似的预期结果。
taken from http://learnyousomeerlang.com/starting-out-for-real#atoms
With this being said, atoms end up being a better semantic fit to describing data in your code in places other languages would be forced to use either strings, enums or defines. They're safer and friendlier to use for similar intended results.
一个简短的例子展示了操作符号的能力如何产生更干净的代码:(代码是在Scheme中,Lisp的一种方言)。
您可以使用字符串或整数常量编写此程序。但符号版本有一定的优点。保证符号在系统中是唯一的。这使得比较两个符号与比较两个指针一样快。这显然比比较两个字符串要快。使用整数常量允许人们编写无意义的代码,例如:
这个问题的详细答案可以在书中找到 Common Lisp :符号计算简单介绍。
A short example that shows how the ability to manipulate symbols leads to cleaner code: (Code is in Scheme, a dialect of Lisp).
You can write this program using character strings or integer constants. But the symbolic version has certain advantages. A symbol is guaranteed to be unique in the system. This makes comparing two symbols as fast as comparing two pointers. This is obviously faster than comparing two strings. Using integer constants allows people to write meaningless code like:
Probably a detailed answer to this question could be found in the book Common Lisp: A Gentle Introduction to Symbolic Computation.
当你的语义值没有自然的底层“本机”表示时,原子(在 Erlang 或 Prolog 等中)或符号(在 Lisp 或 Ruby 等中)——这里仅称为原子——非常有用。它们采用 C 风格枚举的空间,如下所示:
不同之处在于原子通常不需要声明,并且它们没有需要担心的底层表示。 Erlang 或 Prolog 中的原子
monday
的值为“theatommonday
”,仅此而已。虽然确实可以从字符串类型中获得与从原子中获得的大部分相同的用途,但后者有一些优点。首先,因为保证原子是唯一的(在幕后它们的字符串表示形式被转换为某种形式的易于测试的 ID),所以比较它们比比较等效字符串要快得多。其次,它们是不可分割的。例如,原子
monday
无法测试它是否以day
结尾。它是一个纯粹的、不可分割的语义单位。换句话说,与字符串表示形式相比,您的概念重载更少。您还可以通过 C 样式枚举获得许多相同的好处。尤其是比较速度,如果有的话,更快。但是...它是一个整数。你可以做一些奇怪的事情,比如将
SATURDAY
和SUNDAY
转换为相同的值:这意味着你不能相信不同的“符号”(枚举)是不同的东西,这使得对代码的推理变得更加困难。同样,通过有线协议发送枚举类型也是有问题的,因为无法区分它们和常规整数。原子不存在这个问题。原子不是一个整数,并且永远不会看起来像幕后的一个。
Atoms (in Erlang or Prolog, etc.) or symbols (in Lisp or Ruby, etc.)—from herein only called atoms—are very useful when you have a semantic value that has no natural underlying "native" representation. They take the space of C-style enums like this:
The difference is that atoms don't typically have to be declared and they have NO underlying representation to worry about. The atom
monday
in Erlang or Prolog has the value of "the atommonday
" and nothing more or less.While it is true that you could get much of the same use out of string types as you would out of atoms, there are some advantages to the latter. First, because atoms are guaranteed to be unique (behind the scenes their string representations are converted into some form of easily-tested ID) it is far quicker to compare them than it is to compare equivalent strings. Second, they are indivisible. The atom
monday
cannot be tested to see if it ends inday
for example. It is a pure, indivisible semantic unit. You have less conceptual overloading than you would in a string representation in other words.You could also get much of the same benefit with C-style enumerations. The comparison speed in particular is, if anything, faster. But... it's an integer. And you can do weird things like have
SATURDAY
andSUNDAY
translate to the same value:This means you can't trust different "symbols" (enumerations) to be different things and thus makes reasoning about code a lot more difficult. Too, sending enumerated types through a wire protocol is problematical because there's no way to distinguish between them and regular integers. Atoms do not have this problem. An atom is not an integer and will never look like one behind the scenes.
作为一名 C 程序员,我在理解 Ruby 符号到底是什么时遇到了问题。看到源码中符号是如何实现的,我恍然大悟。
在 Ruby 代码内部,有一个全局哈希表,字符串映射到整数。所有 Ruby 符号都保存在那里。 Ruby 解释器在源代码解析阶段使用该哈希表将所有符号转换为整数。然后在内部所有符号都被视为整数。这意味着一个符号仅占用 4 个字节的内存,并且所有比较都非常快。
所以基本上你可以将 Ruby 符号视为以非常聪明的方式实现的字符串。 它们看起来像字符串,但执行起来几乎像整数。
当创建一个新字符串时,在 Ruby 中会分配一个新的 C 结构来保存该对象。对于两个 Ruby 字符串,有两个指针指向两个不同的内存位置(可能包含相同的字符串)。然而,符号会立即转换为 C int 类型。因此,无法将两个符号区分为两个不同的 Ruby 对象。这是实施的副作用。只需在编码时记住这一点即可。
As a C programmer I had a problem with understanding what Ruby symbols really are. I was enlightened after I saw how symbols are implemented in the source code.
Inside Ruby code, there is a global hash table, strings mapped to integers. All ruby symbols are kept there. Ruby interpreter, during source code parse stage, uses that hash table to convert all symbols to integers. Then internally all symbols are treated as integers. This means that one symbol occupies only 4 bytes of memory and all comparisons are very fast.
So basically you can treat Ruby symbols as strings which are implemented in a very clever way. They look like strings but perform almost like integers.
When a new string is created, then in Ruby a new C structure is allocated to keep that object. For two Ruby strings, there are two pointers to two different memory locations (which may contain the same string). However a symbol is immediately converted to C int type. Therefore there is no way to distinguish two symbols as two different Ruby objects. This is a side effect of the implementation. Just keep this in mind when coding and that's all.
在 Lisp 中,符号和原子是两个不同且不相关的概念。
通常在 Lisp 中,ATOM 并不是一种特定的数据类型。它是 NOT CONS 的缩写。
此外,ATOM 类型与 类型相同(不是 CONS)。
在 Common Lisp 中,任何不是 cons cell 的东西都是原子。
SYMBOL 是一种特定的数据类型。
符号是具有名称和身份的对象。符号可以驻留在包中。符号可以有一个值、一个函数和一个属性列表。
在 Lisp 源代码中,变量、函数、类等的标识符都写为符号。如果阅读器读取 Lisp s 表达式,如果未知符号(在当前包中可用),它会创建新符号,或者重用现有符号(如果在当前包中可用)。如果 Lisp 阅读器读取list 就像
这样,它创建了一个包含两个 cons 单元的列表。每个 cons 单元的 CAR 都指向同一个符号 snow。
另外请注意,plist (符号的属性列表可以存储符号的附加元信息,这可以是作者、源位置等。用户也可以在他/她的程序中使用此功能。
In Lisp symbol and atom are two different and unrelated concepts.
Usually in Lisp an ATOM is not a specific data type. It is a short hand for NOT CONS.
Also the type ATOM is the same as the type (NOT CONS).
Anything that is not a cons cell is an atom in Common Lisp.
A SYMBOL is a specific datatype.
A symbol is an object with a name and identity. A symbol can be interned in a package. A symbol can have a value, a function and a property list.
In Lisp source code the identifiers for variables, functions, classes and so on are written as symbols. If a Lisp s-expression is read by the reader, it does create new symbols if they are not known (available in the current package) or reuses an existing symbol (if it is available in the current package. If the Lisp reader reads a list like
then it creates a list of two cons cells. The CAR of each cons cell point to the same symbol snow. There is only one symbol for it in the Lisp memory.
Also note that the plist (the property list) of a symbol can store additional meta information for a symbol. This could be the author, a source location, etc. The user can also use this feature in his/her programs.
在Scheme(以及Lisp家族的其他成员)中,符号不仅有用,而且是必不可少的。
这些语言的一个有趣的属性是它们是同像。一个Scheme程序或表达式本身可以表示为一个有效的Scheme数据结构。
一个例子可能会让这一点更清楚(使用Gauche方案):
这里,expr只是一个列表,由符号+、符号x组成,以及数字1。我们可以像任何其他列表一样操作这个列表,传递它等等。但我们也可以评估它,在这种情况下它将被解释为代码。
为了实现这一点,Scheme 需要能够区分符号和字符串文字。在上面的示例中,x 是一个符号。它不能在不改变含义的情况下替换为字符串文字。如果我们获取一个列表 '(print x)(其中 x 是一个符号)并对其进行求值,则它的含义不同于 '(print "x") ,其中“x”是一个字符串。
顺便说一下,使用Scheme 数据结构表示Scheme 表达式的能力不仅仅是一个噱头;将表达式读取为数据结构并以某种方式转换它们是宏的基础。
In Scheme (and other members of the Lisp family), symbols are not just useful, they are essential.
An interesting property of these languages is that they are homoiconic. A Scheme program or expression can itself be represented as a valid Scheme data structure.
An example might make this clearer (using Gauche Scheme):
Here, expr is just a list, consisting of the symbol +, the symbol x, and the number 1. We can manipulate this list like any other, pass it around, etc. But we can also evaluate it, in which case it will be interpreted as code.
In order for this to work, Scheme needs to be able to distinguish between symbols and string literals. In the example above, x is a symbol. It cannot be replaced with a string literal without changing the meaning. If we take a list '(print x), where x is a symbol, and evaluate it, that means something else than '(print "x"), where "x" is a string.
The ability to represent Scheme expressions using Scheme data structures is not just a gimmick, by the way; reading expressions as data structures and transforming them in some way, is the basis of macros.
你说 python 没有原子或符号的类似物实际上是不正确的。在 python 中制作表现得像原子的对象并不困难。只是制作物体。普通的空物体。示例:
田田! python 中的原子!我一直用这个技巧。事实上,你可以走得更远。你可以给这些对象一个类型:
现在,你的颜色有一个类型,所以你可以做这样的事情:
所以,这在功能上或多或少与 lispy 符号相同,以及它们的属性列表。
You're actually not right in saying python has no analogue to atoms or symbols. It's not difficult to make objects that behave like atoms in python. Just make, well, objects. Plain empty objects. Example:
TADA! Atoms in python! I use this trick all the time. Actually, you can go further than that. You can give these objects a type:
Now, your colours have a type, so you can do stuff like this:
So, that's more or less equivalent in features to lispy symbols, what with their property lists.
在某些语言中,关联数组文字的键的行为类似于符号。
在Python[1]中,是一个字典。
在 Perl[2] 中,是一个散列。
在 JavaScript[3] 中,是一个对象。
在这些情况下,
foo
和bar
就像符号,即不带引号的不可变字符串。[1] 证明:
[2] 这并不完全正确:
[1] 在 JSON 中,不允许使用此语法,因为必须用引号引起来的键。我不知道如何证明它们是符号,因为我不知道如何读取变量的内存。
In some languages, associative array literals have keys that behave like symbols.
In Python[1], a dictionary.
In Perl[2], a hash.
In JavaScript[3], an object.
In these cases,
foo
andbar
are like symbols, i.e., unquoted immutable strings.[1] Proof:
[2] This is not quite true:
[1] In JSON, this syntax is not allowed because keys must be quoted. I don't know how to prove they are symbols because I don't know how to read the memory of a variable.
原子保证是唯一且完整的,与浮点常量值相反,浮点常量值可能会因编码、通过线路发送它们、在另一端解码并转换回浮点时的不准确性而有所不同。无论您使用什么版本的解释器,它都能确保atom始终具有相同的“值”并且是唯一的。
Erlang VM 将所有模块中定义的所有原子存储在全局原子表中。
Erlang 中没有布尔数据类型。相反,原子
true
和false
用于表示布尔值。这可以防止人们做这种令人讨厌的事情:在 Erlang 中,您可以将原子保存到文件中,读回它们,通过远程 Erlang VM 之间的线路传递它们等。
就像示例一样,我将把几个术语保存到一个文件中,然后读回它们。这是 Erlang 源文件
lib_misc.erl
(或者我们现在最感兴趣的部分):现在我将编译这个模块并将一些术语保存到文件中:
在文件
erlang.erl 中。 terms
我们将得到以下内容:现在让我们读回它:
您看到数据已成功从文件中读取,并且变量
SomeAtom
确实保存了一个原子erlang_atom.
lib_misc.erl
内容摘自 Joe Armstrong 的《Programming Erlang: Software for a Concurrent World》,The Pragmatic Bookshelf 出版。其余源代码位于此处。Atoms are guaranteed to be unique and integral, in contrast to, e. g., floating-point constant values, which can differ because of inaccuracy while you're encoding, sending them over the wire, decoding on the other side and converting back to floating point. No matter what version of interpreter you're using, it ensures that atom has always the same "value" and is unique.
The Erlang VM stores all the atoms defined in all the modules in a global atom table.
There's no Boolean data type in Erlang. Instead the atoms
true
andfalse
are used to denote Boolean values. This prevents one from doing such kind of nasty thing:In Erlang, you can save atoms to files, read them back, pass them over the wire between remote Erlang VMs etc.
Just as example I'll save a couple of terms into a file, and then read them back. This is the Erlang source file
lib_misc.erl
(or its most interesting part for us now):Now I'll compile this module and save some terms to a file:
In the file
erlang.terms
we'll get this contents:Now let's read it back:
You see that the data is successfully read from the file and the variable
SomeAtom
really holds an atomerlang_atom
.lib_misc.erl
contents are excerpted from "Programming Erlang: Software for a Concurrent World" by Joe Armstrong, published by The Pragmatic Bookshelf. The rest source code is here.在 Ruby 中,符号经常用作哈希中的键,因此 Ruby 1.9 甚至引入了构造哈希的简写。您之前写的是:
现在可以写成:
本质上,它们是介于字符串和整数之间的东西:在源代码中它们类似于字符串,但有很大的差异。相同的两个字符串实际上是不同的实例,而相同的符号始终是相同的实例:
这会对性能和内存消耗产生影响。而且,它们是不可变的。一旦分配就不能更改。
一个有争议的经验法则是对每个不用于输出的字符串使用符号而不是字符串。
虽然看起来可能无关紧要,但大多数代码突出显示编辑器对符号的颜色与代码的其余部分不同,从而进行视觉区分。
In Ruby, symbols are often used as keys in hashes, so often that Ruby 1.9 even introduced a shorthand for constructing a hash. What you previously wrote as:
can now be written as:
Essentially, they are something between strings and integers: in source code they resemble strings, but with considerable differences. The same two strings are in fact different instances, while the same symbols are always the same instance:
This has consequences both with performance and memory consumption. Also, they are immutable. Not meant to be altered once when assigned.
An arguable rule of thumb would be to use symbols instead of strings for every string not meant for output.
Although perhaps seeming irrelevant, most code-highlighting editors colour symbols differently than the rest of the code, making the visual distinction.
我在其他语言(例如,C)中遇到的类似概念的问题可以很容易地表达为:
或
导致以下问题:
两者都没有真正意义。原子解决了类似的问题,但没有上述缺点。
The problem I have with similar concepts in other languages (eg, C) can be easily expressed as:
or
Which causes problems such as:
Neither of which really make sense. Atoms solve a similar problem without the drawbacks noted above.
原子就像一个开放的枚举,具有无限可能的值,并且无需预先声明任何内容。这就是它们在实践中通常的使用方式。
例如,在 Erlang 中,进程期望接收几种消息类型之一,并且用原子标记消息是最方便的。大多数其他语言都会使用枚举作为消息类型,这意味着每当我想发送新类型的消息时,我都必须将其添加到声明中。
此外,与枚举不同,原子值集可以组合。假设我想监控 Erlang 进程的状态,并且我有一些标准的状态监控工具。我可以扩展我的流程以响应状态消息协议以及其他消息类型。使用枚举,我将如何解决这个问题?
问题是MSG_1是0,STATUS_HEARTBEAT也是0。当我收到类型0的消息时,它是什么?对于原子,我就没有这个问题。
原子/符号不仅仅是具有恒定时间比较的字符串:)。
Atoms are like an open enum, with infinite possible values, and no need to declare anything up front. That is how they're typically used in practice.
For example, in Erlang, a process is expecting to receive one of a handful of message types, and it's most convenient to label the message with an atom. Most other languages would use an enum for the message type, meaning that whenever I want to send a new type of message, I have to go add it to the declaration.
Also, unlike enums, sets of atom values can be combined. Suppose I want to monitor my Erlang process's status, and I have some standard status monitoring tool. I can extend my process to respond to the status message protocol as well as my other message types. With enums, how would I solve this problem?
The problem is MSG_1 is 0, and STATUS_HEARTBEAT is also 0. When I get a message of type 0, what is it? With atoms, I don't have this problem.
Atoms/symbols are not just strings with constant-time comparison :).
原子提供快速的相等性测试,因为它们使用身份。与枚举类型或整数相比,它们具有更好的语义(为什么要用数字表示抽象符号值?)并且它们不限于像枚举这样的固定值集。
妥协是它们的创建比文字字符串更昂贵,因为系统需要知道所有现有实例以保持唯一性;这主要花费编译器的时间,但它消耗的内存为 O(唯一原子的数量)。
Atoms provide fast equality testing, since they use identity. Compared to enumerated types or integers, they have better semantics (why would you represent an abstract symbolic value by a number anyway?) and they are not restricted to a fixed set of values like enums.
The compromise is that they are more expensive to create than literal strings, since the system needs to know all exising instances to maintain uniqueness; this costs time mostly for the compiler, but it costs memory in O(number of unique atoms).