沃尔特·布莱特(Walter Bright)使用了“冗余”这个词……或者“这是什么意思?”

发布于 2024-09-15 07:24:52 字数 587 浏览 3 评论 0原文

所以我正在阅读 Walter Bright 关于 Bitwise 中的 D 语言的采访 ( http://www.bitwisemag.com/copy/programming/d/interview/d_programming_language.html),我遇到了关于语言解析的这段非常有趣的引用:

然而,从理论角度来看,能够生成良好的诊断需要语法中存在冗余。冗余用于猜测意图,冗余越多,猜测就越有可能是正确的。这就像英语——如果我们偶尔拼错一个单词,或者缺少一个单词,冗余使我们能够正确猜测其含义。如果语言中没有冗余,那么任何随机的字符序列都是有效的程序。

现在我正试图弄清楚他说的“冗余”到底是什么意思。

我几乎无法理解最后一部分,他提到有可能有一种语言,其中“任何随机的字符序列都是有效的程序”。我被告知错误分为三种:语法错误、运行时错误和语义错误。是否存在唯一可能的错误是语义错误的语言?组装是这样的吗?机器码呢?

So I'm reading this interview with Walter Bright about the D language in Bitwise (http://www.bitwisemag.com/copy/programming/d/interview/d_programming_language.html), and I come across this really interesting quote about language parsing:

From a theoretical perspective, however, being able to generate a good diagnostic requires that there be redundancy in the syntax. The redundancy is used to make a guess at what was intended, and the more redundancy, the more likely that guess will be correct. It's like the English language - if we misspell a wrod now and then, or if a word missing, the redundancy enables us to correctly guess the meaning. If there is no redundancy in a language, then any random sequence of characters is a valid program.

And now I'm trying to figure out what the heck he means when he says "redundancy".

I can barely wrap my head around the last part, where he mentions that it is possible to have a language in which "any random sequence of characters is a valid program." I was taught that there are three kinds of errors: syntactic, run-time, and semantic. Are there languages in which the only possible errors are semantic? Is assembly like that? What about machine code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

苏别ゝ 2024-09-22 07:24:52

我将重点讨论为什么(我认为)Walther Bright 认为冗余是好的。我们以 XML 为例。此代码片段:

<foo>...</foo>

有冗余,如果我们使用 S 表达式,则结束标记是多余的:

(foo ...)

它更短,并且程序员不必频繁地键入 foo 来理解该代码片段。减少冗余。但它也有缺点,例如 http://www.prescod.net/xml/sexprs。 html 显示:

(document author: "[email protected]"
    (para "This is a paragraph " (footnote "(better than the one under there)" ".")
    (para "Ha! I made you say \"underwear\"."))


<document author="[email protected]">
<para>This is a paragraph <footnote>(just a little one).</para>
<para>Ha! I made you say "underwear".</para>
</document>

两者都缺少脚注的结束标记/结束括号。一旦解析器看到 ,xml 版本就完全无效。 S 表达式仅在文档末尾有效,并且仅当您在其他地方没有不需要的右括号时才有效。因此,在某些情况下,冗余确实有助于理解作者的意思(并指出他表达方式中的错误)。

I'll focus on why (I think) Walther Bright thinks redunancy is good. Let's take XML as an example. This snippet:

<foo>...</foo>

has redunancy, the closing tag is redunant if we use S-Expressions instead:

(foo ...)

It's shorter, and the programmer doesn't have to type foo more often than neccessary to make sense of that snippet. Less redunancy. But it has downsides, as an example from http://www.prescod.net/xml/sexprs.html shows:

(document author: "[email protected]"
    (para "This is a paragraph " (footnote "(better than the one under there)" ".")
    (para "Ha! I made you say \"underwear\"."))


<document author="[email protected]">
<para>This is a paragraph <footnote>(just a little one).</para>
<para>Ha! I made you say "underwear".</para>
</document>

In both, the end tag/a closing paren for footnote is missing. The xml version is plain invalid as soon as the parser sees </para>. The S-Expression one is only invalid by the end of the document, and only if you don't have an unneeded closing paren somewhere else. So redunancy does help, in some cases, to udnerstand what the writer meant (and point out errors in his way of expressing that).

明媚如初 2024-09-22 07:24:52

汇编语言(无论如何,大多数汇编语言)根本不是这样的——它们具有相当严格的语法,并且大多数随机字符串都会被诊断为错误。

机器代码更接近。由于不涉及从“源”代码到“对象”代码的转换,因此所有错误都是语义错误,而不是语法错误。大多数处理器确实有各种他们会拒绝的输入(例如,执行“错误操作码”陷阱/中断)。您可能会争辩说,在某些情况下,这将是语法(例如,根本无法识别的操作码),而其他则是语义(例如,该指令不允许的一组操作数)。

对于那些还记得它的人来说,东元因为几乎所有可能的输入分配一些含义而闻名(臭名昭著?),所以它几乎是相同的方式。一个有趣的挑战是弄清楚如果您输入(例如)您的名字会发生什么。

Assembly language (most assembly languages, anyway) is not like that at all -- they have quite a rigid syntax, and most random strings would be diagnosed as errors.

Machine code is a lot closer. Since there's no translation from "source" to "object" code involved, all errors are semantic, not syntactic. Most processors do have various inputs they'd reject (e.g., execute a "bad opcode" trap/interrupt). You could argue that in some cases this would be syntactic (e.g., an opcode that wasn't recognized at all) where others are semantic (e.g., a set of operands that weren't allowed for that instruction).

For those who remember it, TECO was famous (notorious?) for assigning some meaning to almost any possible input, so it was pretty much the same way. An interesting challenge was to figure out what would happen if you typed in (for one example) your name.

淡淡的优雅 2024-09-22 07:24:52

nglsh nclds ll srts of xtr ltrs t mk it ezr t 读

nglsh nclds ll srts of xtr ltrs t mk it ezr t read

深海蓝天 2024-09-22 07:24:52

好吧,使用 C# 中的示例(因为我不知道 D)。如果你有一个带有抽象方法的类,则该类本身必须标记为抽象:

public abstract class MyClass
{
    public abstract MyFunc();
}

现在,编译器自动将 MyClass 标记为抽象(这是 C++ 处理它的方式)是微不足道的,但在 C# 中,你必须这样做明确地表达出来,这样你的意图就很明确。

虚拟方法类似。在 C++ 中,如果在基类中声明 virtual,则方法在所有派生类中自动为 virtual。在 C# 中,该方法必须显式标记为override,这样就不会混淆您想要的内容。

Well, to use an example from C# (since I don't know D). If you have a class with an abstract method, the class itself must be marked abstract:

public abstract class MyClass
{
    public abstract MyFunc();
}

Now, it would be trivial for the compiler to automatically mark MyClass as abstract (that is the way C++ handles it), but in C#, you must do it explicitly, so that your intentions are clear.

Similarly with virtual methods. In C++, if declare virtual in a base class, a method is automatically virtual in all derived classes. In C#, the method must nevertheless be explicit marked override, so there is no confusion about what you wanted.

彩虹直至黑白 2024-09-22 07:24:52

我认为他正在谈论语言的句法结构以及如何解释它们。举个例子,考虑用多种语言呈现的简单的“if”语句。

在 bash(shell 脚本)中,它看起来像这样:

if [ cond ]; then
  stmts;
elif [ other_cond ]; then
  other_stmts;
else
  other_other_stmts;
fi

在 C 中(带单个语句,没有大括号):

if (cond)
  stmt;
else if (other_cond)
  other_stmt;
else
  other_other_stmt;

您可以看到,在 bash 中,if 语句的语法结构比 C 中多得多。事实上,bash 中的所有控制结构都有自己的结束分隔符(例如 if/then/fifor/do/donecase/in/esac code>,...),而在 C 中,花括号随处可见。这些独特的分隔符消除了代码含义的歧义,从而提供了解释器/编译器可以诊断错误情况并将其报告给用户的上下文。

然而,这需要权衡。程序员通常更喜欢简洁的语法(如 C、Lisp 等)而不是冗长的语法(如 Pascal、Ada 等)。但是,他们也更喜欢包含行/列号和建议解决方案的描述性错误消息。当然,这些目标是相互矛盾的——鱼和熊掌不可兼得(至少在保持编译器/解释器的内部实现简单的同时)。

I think he was talking about syntactical structures in the language and how they can be interpreted. As an example, consider the humble "if" statement, rendered in several languages.

In bash (shell script), it looks like this:

if [ cond ]; then
  stmts;
elif [ other_cond ]; then
  other_stmts;
else
  other_other_stmts;
fi

in C (w/single statments, no curly braces):

if (cond)
  stmt;
else if (other_cond)
  other_stmt;
else
  other_other_stmt;

You can see that in bash, there is a lot more syntactical structure to the if statement than there is in C. In fact, all control structures in bash have their own closing delimiters (e.g. if/then/fi, for/do/done, case/in/esac,...), whereas in C the curly brace is used everywhere. These unique delimiters disambiguate the meaning of the code, and thereby provide context from which the interpreter/compiler can diagnose error conditions and report them to the user.

There is, however, a tradeoff. Programmers generally prefer terse syntax (a la C, Lisp, etc.) to verbose syntax (a la Pascal, Ada, etc.). However, they also prefer descriptive error messages containing line/column numbers and suggested resolutions. These goals are of course at odds with each other--you can't have your cake and eat it too (at least, while keeping the internal implementation of the compiler/interpreter simple).

千と千尋 2024-09-22 07:24:52

这意味着语法包含的信息多于编码工作程序所需的信息。一个例子是函数原型。正如 K&RC 向我们展示的那样,它们是多余的,因为编译器可以让调用者推送您想要的任何参数,然后让函数弹出正确的参数。但 C++ 和其他语言强制要求它们,因为它们帮助编译器检查您是否以正确的方式调用函数。

另一个例子是要求在使用变量之前声明它们。有些语言有这个,而另一些则没有。它显然是多余的,但它通常有助于防止错误(例如拼写错误、使用已删除的变量)。

It means that the syntax contains more information than necessary to encode a working program. An example is function prototypes. As K&R C shows us, they're redundant because the compiler can just let the caller push whatever arguments you want on, then let the function pop the correct arguments off. But C++ and other languages mandate them, because they help the compiler check that you're calling the function the right way.

Another example is the requirement to declare variables before using them. Some languages have this, while others don't. It it is clearly redundant, but it often helps prevent errors (e.g misspelling, using a variable that has been removed).

不甘平庸 2024-09-22 07:24:52

我认为更好的冗余示例是 int a[10] =。此时,编译器知道接下来应该发生什么(int 数组初始值设定项),并且如果接下来的不是 int 数组初始值设定项,则可以提供适当的错误消息。如果语言语法规定 int a[10] 后面可以有任何内容,那么编译器就很难找出其中的问题。

I think a better example of redundancy is something like int a[10] =. At this point, the compiler knows what should come next, an int array initializer, and can come up with an appropriate error message if what follows isn't an int array initializer. If the language syntax said that anything could follow int a[10], it would be a lot harder for the compiler to figure out problems with one.

吐个泡泡 2024-09-22 07:24:52

那么任何随机的字符序列都是有效的程序。

尽管不完全是“任何随机序列都有效”,但请考虑 Perl 和正则表达式。它们非常短的语法使无效字符更容易通过语法和语义分析。

then any random sequence of characters is a valid program.

Although not quite "any random sequence is valid", consider Perl and Regular Expressions. Their very short syntax makes it easier for invalid characters to still pass syntactic and semantic analysis.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文