语言内语义差异

发布于 2024-09-16 04:58:51 字数 1517 浏览 8 评论 0原文

我一直在考虑做我自己的语言（实用性：这是一个思想实验）。我提出的想法之一是语言内语义变异。您将编写本质上语义正则表达式，并用等效代码替换。您可以在 D 中以不太直接的形式看到这一点 - 它们具有转换为 D 代码的字符串混合。但我打算以一种更循环的方式隐式地做这些事情。

现在，我是 C++ 出身。因此，如果您考虑：

string a, b, c, d;
// do stuff
a = b + c + d;

此代码会导致各种临时结果。即使您有右值引用，您也会创建临时引用，它们只会被更有效地重用。但它们仍然存在并且仍然浪费性能。我正在考虑，在最简单的情况下，如何消除这些问题。您可以编写一个语义正则表达式，将其转换为最优化的形式。

string a, b, c, d;
// do stuff
a.resize(b.size() + c.size() + d.size());
a = b; a += c; a += d;

如果我实现了 std::string，我也许能够写得更快。关键是它们是隐式的 - 当您使用 std::string 类时，std::string 实现者编写的公理可以影响任何 std::string 代码。您可以将其放入现有的 C++ 代码库中，重新编译，并获得 std::string 实现者可以免费想到的最快字符串连接。

目前，您可以进行的优化是有限的，因为您只有语言允许的上下文，在这种情况下，C++ 中的运算符重载仅采用两个参数，即 this 和 arg。但是语义正则表达式几乎可以获取您可能需要的所有上下文 - 因为您可以指定它匹配的内容 - 甚至可以匹配宿主语言中不存在的语言功能。例如，

string a;
a.size;

进行交换

string a;
a.size();

如果您想窃取 C# 属性，那么将是微不足道的。您可以匹配类定义并实现编译或运行时反射等。

但是，我的意思是，这可能会令人困惑。如果存在错误，或者幕后真正完成的操作没有反映所编写的代码，那么追踪起来可能会很麻烦，而且我还没有考虑过如何深入实现它。你们觉得我提议的语言功能怎么样？

天哪，选择正确的标签。嗯……

编辑：关于我的一个答案，我也想突破限制范围。简单的事实是语义正则表达式没有限制（减去可能需要添加的实现细节）。例如，您可以将表达式转换

int i;
cin >> i;
int lols[i];

为

int i;
cin >> i;
std::variable_array<int>(alloca(sizeof(int) * i), i);

Alloca 的语义使得使用模板进行操作变得不可能 - 如果您想要上述内容，您必须编写一个宏。在 C++03 或 C++0x 中，您无法封装自己的 VLA。

此外，语义正则表达式可以匹配实际上不调用任何编译时工作的代码。例如，您可以匹配类定义的每个成员并使用它来创建反射系统。迄今为止，这在 C++ 中也是不可能的。

原文

I've been thinking about doing my own language (practicality: it's a thought experiment). One of the ideas I came up with is in-language semantic variation. You'd write essentially semantic regular expressions, to be replaced with equivalent code. You can see this in a somewhat less direct form in D- they have string mixins that convert to D code. Except I was going to do them implicitly, and in a more circular fashion.

Right now, I originate from C++. So if you consider:

string a, b, c, d;
// do stuff
a = b + c + d;

This code results in various temporaries. Even if you have rvalue references, you will create temporaries, they will simply be re-used more efficiently. But they still exist and still waste performance. I was thinking about, in the most simple case, of how these could be eliminated. You could write a semantic regular expression that would convert it into the most optimized form.

string a, b, c, d;
// do stuff
a.resize(b.size() + c.size() + d.size());
a = b; a += c; a += d;

If I implemented std::string, I might be able to write something even faster. The key to this is that they're implicit - when you use the std::string class, the axioms written by the std::string implementer can affect any std::string code. You could just drop it in to an existing C++ codebase, recompile, and get the fastest string concatenation that your std::string implementer can conceive of for free.

At the moment, the optimizations you can make are limited, because you only have as much context as the language allows you, in this case, operator overloading in C++ only taking two arguments, this and arg. But a semantic reg ex could take virtually all the context you could ever need - since you can dictate what it matches - and even match to language features that don't exist in the host language. For example, it would be trivial to exchange

string a;
a.size;

for

string a;
a.size();

if you wanted to steal C# properties. You could match class definitions and implement compile or run time reflection, etc.

But, I mean, it could get confusing. If there was a bug, or what was really done behind the scenes didn't reflect the code that was written, it could be a total bitch to track down, and I've not considered how it would be implemented in depth. What do you guys think of my proposed language feature?

Oh man, choosing the right tags. Ummm....

Edit: I also wanted to breach the scope of limits, as regards to one answer I had. The simple fact is that semantic regex has no limits (minus implementation details that may have to be added). For example, you could turn the expression

int i;
cin >> i;
int lols[i];

into

int i;
cin >> i;
std::variable_array<int>(alloca(sizeof(int) * i), i);

The semantics of alloca make manipulation with templates impossible- you have to write a macro if you want the above. In C++03 or C++0x, you cannot encapsulate your own VLAs.

In addition, semantic regexes can match code that doesn't actually invoke any compile-time work. For example, you could match every member of a class definition and use it to create a reflection system. This is also impossible in C++ to date.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

太阳公公是暖光 2024-09-23 04:58:51

如果您在 Google 上搜索“C++ 表达式模板”之类的内容，您会发现实际上 C++ 已经具有非常相似的功能。根据您提出的语法，您的想法可能会使此类代码更容易理解（表达式模板当然并不是微不足道的），但至少对我来说，并不完全清楚您添加了很多（如果有的话）以真正的新能力的方式。

回复收藏 0 原文

夏末的微笑 2024-09-23 04:58:51

（警告：前方有巨大的答案！）

我认为它被称为宏；）好吧，至少在 C/C++ 世界之外（其中“宏”指的是预处理器提供的这种非常有限的替代）。而且这并不是很新颖。尽管我认为一个适当的、强大的宏系统可以为语言添加比任何其他功能更多的功能（假设我们保留足够的原语，它不仅是图灵完整的，而且对真正的编程有用），因为一个足够聪明的程序员可以添加几乎未来可能证明有用或对于特定领域有用的所有功能，而无需向语言添加（进一步）规则。

基本思想是将程序解析为带有源代码的字符串上方的表示，例如 AST< /a> 或解析树。这样的树提供了有关程序的更多信息，并且另一个程序可以遍历该树并修改它。例如，可以查找 VariableDeclaration 节点，检查它是否声明了 T 的普通旧数组，并将其替换为新的 VariableDeclaration< /code> 节点改为声明 T 的 std::variable_array。例如，可以通过为树提供模式匹配来完善这一点，从而使元编程变得更容易。一个强大的过程，当且仅当程序员能够处理这种抽象级别并知道如何充分利用它。

请注意，当我谈到“模式匹配”时，我指的是函数式编程中的模式匹配，而不是正则表达式。正则表达式不足以理解不规则语言，这包括几乎所有有用的语言 - 仅允许任意大小的表达式（包括平衡括号）将正则表达式排除在外。请参阅关于什么是函数式语言中的“模式匹配”？ 对模式匹配的精彩介绍，也许可以学习像 Haskell 或 O'Caml 这样的函数式语言，只要学习如何使用它以及如何处理树（还有大量其他很酷的功能）！）。

现在谈谈你提议的语言：老实说，我怀疑它会有用。 C++本身就是如何不设计一种语言的完美例子（除非你想成功）：采用现有的语言，保持向后兼容=保留所有它（包括不好的东西），并添加一堆新功能它们本身就足够复杂，然后对它们进行一千次调整并添加一百种特殊情况，以或多或少地与现有语言的语法和语义配合。它使成功的可能性更大（如果你开始使用的语言已经很流行），但你最终会得到一个神秘且不优雅的野兽。话虽这么说，我真的很想看到一种允许如此强大的宏的非 Lisp 语言。

正确的（或者至少是更好的）方法是重新思考每一个部分，从最基本的语义到精确的语法，将其与您想要添加的内容集成，并调整新形成的语言的所有部分，直到整个画面看起来不错。就您而言，这将产生一个极其方便的副作用：易于解析。当然，在应用宏之前必须解析源代码，因为它们关注的是树，而不是字符串片段。但 C++ 很难解析。就像字面上最难解析的常用语言一样。

哦，当我们这样做时：宏本身可以让我们心爱的工具（具有自动完成和调用提示的 IDE、静态代码分析等）的寿命变得悲惨。理解一段代码已经够困难的了，但如果这段代码在达到执行的形式之前被任意地、甚至可能非常剧烈地转换，情况会变得更糟。一般来说，代码分析工具无法应对宏。整个领域是如此困难，以至于聪明的人发明了新的语言来研究它并写论文我们都无法理解。因此请注意，宏确实有缺点。

(Warning: Mammoth answer ahead!)

I think it's called a macro ;) Well, at least outside the C/C++ world (where "macro" refers to this severly limited substitution the preprocessor provides). And it's not very novel. Though I think a proper, powerful macro system can add more power to a language than any other feature (given we preserve enough primitives that it's not merely turing-completene, but useful for real programming), in that a sufficently smart programmer can add nearly all features that might prove useful in the future or for a specific domain without adding (further) rules to the language.

The basic idea is to parse a program into a representation above a string with the source code, say, an AST or a parse tree. Such trees provide more information about the program, and another program can walk this tree and modify it. For example, it would be possible to look for a VariableDeclaration node, check if it declares a plain old array of T, and replace it with a new VariableDeclaration node that instead declares a std::variable_array of T. This can for example be refined by providing pattern matching for the tree, making metaprogramming easier. A powerful procedure, if and only if the programmer can cope with this level of abstractness and knows how to put it to good use.

Note that when I'm speaking of "pattern matching", I speak of the pattern matching in functional programming, not of regular expressions. Regular expressions are insufficent to make sense of irregular languages, this includes about every useful language - merely allowing expressions of abritary size, including balanced parentheses, rules regular expressions out. See the accepted answer on What is 'Pattern Matching' in functional languages? for an excellent introduction to pattern matching, and maybe learn a functional language like Haskell oder O'Caml if only to learn how to use it and how to process trees (and there's a ton of other cool features!).

Now on the language you propose: Honestly, I doubt it would be useful. C++ itself is a perfect example of how not to design a language (unless you want to successful): Take an existing one, stay backward-compatible = keep all of it (including the bad stuff), and add a bunch of new features that are complex enough by themselves, then tweak them a thousand times and add a hundred special cases to work more-or-less with the syntax and semantics of the existing language. It makes success more likely (if the language you started with is already popular), but you end up with an arcane and inelegant beast. That being said, I'd really love to see a non-lisp language that allows macros of such power.

The right (or at least, a better) way would be rethinking every single bit, from the most basics semantics to the exact syntax, integrate it with what you want to add, and tweak all parts of the newly formed language until the whole picture looks right. In your case, this would have an extremely convenient side effect: Ease of parsing. Of course, the source must be parsed before macros can be applied, as they concern themselfes with a tree, not with string fragments. But C++ is very hard to parse. Like, literally the hardest-to-parse language in common use.

Oh, while we're at it: Macros themselves can make the life of our beloved tools (IDEs with autocomplete and call tips, static code analysis, etc pp) miserable. Making sense of a piece of code is hard enough, but it gets even worse if this code will be transformed abritarily, and possibly very heavily, before it reaches the form that is executed. In general, code analysis tools can't cope with macros. The whole area is so hard that clever people make up new languages for research on it and write papers on it neither of us can comprehend. So be aware that macros do have downsides.

回复收藏 0 原文

~没有更多了~