Go 中没有符号表?

发布于 2024-08-11 12:23:43 字数 249 浏览 15 评论 0原文

Google 的新语言“Go”在其网站上这样说道:

该语言被设计为易于分析,无需符号表即可解析

我当然不是这些问题的专家,但我认为符号表是所有使用变量的语言和 Go 的编译器所共有的基本构造明确使用变量。我不明白什么?

Google's new language "Go" says on its website:

the language has been designed to be easy to analyze and can be parsed without a symbol table

I'm certainly no expert on these matters, but I thought a symbol table was a basic construct common to all compilers for languages that use variables, and Go clearly uses variables. What am I not understanding?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

夜声 2024-08-18 12:23:43

解析意味着只是弄清楚程序结构:将模块分成语句/声明,将表达式分解为子表达式等。您最终会得到一个树结构,称为“解析树”或“抽象语法树”(谷草转氨酶)。

显然,C++需要符号表来进行解析。

本页讨论为什么 C++ 需要符号表进行解析的一些原因。

当然,解析只是编译的一部分,您将需要符号表来进行完整的编译。

然而,解析本身在编写分析工具时很有用(例如哪个模块导入哪个模块)。因此,简化解析过程意味着更容易编写代码分析工具。

Parsing means just figuring out the program structure: separating the module into statements/declarations, breaking expressions down to sub-expressions, etc. You end up with a tree structure, known as a "parse tree", or "abstract syntax tree" (AST).

Apparently, C++ requires a symbol table to do parsing.

This page discusses some reasons why C++ requires a symbol table for parsing.

Of course, parsing is only a part of compilation, and you will need a symbol table to do a full compilation.

However, parsing itself can be useful in writing analysis tools (e.g. which module imports which modules). So, simplifying the parsing process means it's easier to write code analysis tools.

刘备忘录 2024-08-18 12:23:43

解释和编译绝对需要符号表或类似的东西。几乎所有语言都是如此。

在 C 和 C++ 中,甚至解析语言也需要符号表。

Interpretation and compilation absolutely require symbol tables or similar. This is true for nearly all languages.

In C and C++, even parsing the language requires a symbol table.

美人骨 2024-08-18 12:23:43

@正义是对的。稍微扩展一下,在 C 中,唯一真正棘手的部分是区分类型和变量。特别是当您看到以下内容时:

T t;

您需要知道 T 是一种合法解析的类型。这是您必须在符号表中查找的内容。只要在解析继续时将类型添加到符号表中,就相对容易弄清楚。您不需要在编译器中做太多额外的工作:T 要么出现在表中,要么不出现。

在 C++ 中,事情要复杂得多。存在大量不明确或潜在不明确的结构。最明显的是这个:

B::C (c);

除了不清楚 Bclasstypedef 还是 的事实之外命名空间,也不清楚 C 是否是一种类型,而 c 是该类型的对象,或者 C 是否是一个函数(或构造函数)将 c 作为参数(或者即使 C 是一个重载了 operator() 的对象)。您需要符号表来进行解析,尽管仍然可以足够快地继续解析,因为符号的类型位于符号表中。

当模板加入进来时,事情会变得更加糟糕。如果 C (c) 在模板中,您可能不知道在模板的实际定义中,C 是类型还是函数/对象。这是因为模板可以将 C 声明为类型或变量。这意味着您需要符号表,但您没有符号表,而且在模板实际声明之前您无法拥有符号表。更糟糕的是,仅仅拥有符号的类型还不够:您可能会遇到需要符号所代表的类型的完整信息的情况,包括大小、对齐方式和其他特定于机器的信息。

所有这些都有几个实际效果。我想说的最重要的两个是:

  • 编译速度要快得多。我认为 Go 的编译速度比 C 更快,而 C++ 在涉及大量模板的情况下编译时间很慢。
  • 您可以编写不依赖于完整编译器的解析器。这对于进行代码分析和重构非常有用。

@Justice is right. To expand on that a little, in C the only actual tricky part is telling types apart from variables. Specifically when you see this:

T t;

You need to know that T is a type for that to be a legal parse. That's something you have to look up in a symbol table. This is relatively simple to figure out as long as types are added to the symbol table as the parse continues. You don't need to do much extra work in the compiler: either T is present in the table or it isn't.

In C++ things are much, much more complicated. There are enormous numbers of ambiguous or potentially ambiguous constructs. The most obvious is this one:

B::C (c);

Aside from the fact that it's not clear if B is a class, a typedef, or a namespace, it's also not clear if C is a type and c an object of that type, or if C is a function (or constructor) taking c as an argument (or even if C is an object with operator() overloaded). You need the symbol table to carry on parsing, although it is still possible to continue quickly enough, as the type of the symbol is in the symbol table.

Things get much, much, much worse than that when templates come into the mix. If C (c) is in a template, you might not know in the actual definition of the template, if C is a type or a function/object. That's because the template can declare C to be either a type or a variable. What this means is that you need the symbol table, but you don't have one -- and you can't have one until the template is actually declared. Even worse, it's not necessarily sufficient to have just the type of the symbol: you can come up with situations which require the full information of the type the symbol represents, including size, alignment, and other machine-specific information.

All this has several practical effects. The two most significant I would say are:

  • Compilation is much faster. I assume Go is faster to compile than C, and C++ has famously slow compilation times for situations involving a lot of templates.
  • You can write parsers that don't depend on having a full compiler. This is very useful for doing code analysis and for refactoring.
岁吢 2024-08-18 12:23:43

要解析大多数语言,您需要知道名称何时是变量、类型或函数,以消除某些结构的歧义。 Go 没有这样模糊的结构。

例如:

int x = Foo(bar);

Foo 可以是类型或函数,它们由不同的 AST 类型表示。基本上,解析器永远不需要查找符号来了解如何构建 AST。语法和 AST 比大多数语言简单。真的很酷。

To parse most languages you need to know when names are variables, types or functions to disambiguate certain constructs. Go has no such ambiguous constructs.

For instance:

int x = Foo(bar);

Foo could be a type or a function and they are represented by different AST types. Basically the parser never has to do lookups on symbols to know how to construct the AST. The grammar and the AST are just simpler than most languages. Pretty cool really.

滿滿的愛 2024-08-18 12:23:43

符号表速度很慢并且通常不需要。所以选择离开它。其他函数式语言也不需要。
快速查找需要哈希,但为了支持嵌套范围,您需要将名称推入/弹出到堆栈上。简单的符号表被实现为线性搜索堆栈,更好的符号表被实现为散列,每个符号都有一个堆栈。但搜索仍然必须在运行时完成。

词法作用域语言的解释和编译绝对不需要符号表或类似的东西。
只有动态范围的符号才需要符号表,
一些严格类型语言的编译器需要某种内部符号表来保存类型注释。

在 C 和 C++ 中,甚至解析语言也需要符号表,因为您需要存储全局变量和函数的类型和声明。

词法范围的符号不存储在符号表中,而是作为块框架中的名称索引列表存储,就像在函数语言中一样。这些索引是在编译时计算的。因此运行时访问是立即的。离开作用域会使这些变量自动无法访问,因此您不需要从命名空间/符号表中推送/弹出名称。

没有一流函数的函数式语言通常需要将其函数名称存储在符号表中。作为语言设计者,您尝试将函数绑定到词法,以便能够摆脱符号表中的动态名称查找。

Symbol tables are slow and generally not needed. So go choose to go away with it. Other functional languages also need none.
Fast lookup requires a hash, but to support nested scopes you need to push/pop names onto a stack. Simple symtabs are implemented as linear searched stack, better symtabs as hash with a stack per symbol. But still, search has to be done at run-time.

Interpretation and compilation for lexically scoped languages require absolutely no symbol tables or similar.
Only dynamically scoped symbols need symbol tables,
and some compilers with strictly typed languages need some kind of internal symbol table to hold the type annotations.

In C and C++, even parsing the language requires a symbol table, because you need to store the types and declarations of globals and functions.

Lexically scoped symbols are not stored in symtab's but as indexed list of names in block frames, as in functional languages. Those indices are computed at compile-time. So run-time access is immediate. Leaving the scope makes those vars inaccessible automatically, so you don't need to push/pop names from namespaces/symtabs.

Not so functional languages without first-class functions often need to store their function names in symbol tables. As language designer you try to bind functions to lexicals instead, to be able get rid of dynamic name lookup in symtabs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文