我可以在代码上强制执行哪些类型的模式,以便更轻松地转换为另一种编程语言?

发布于 2024-09-13 19:11:49 字数 1706 浏览 8 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

得不到的就毁灭 2024-09-20 19:11:49

我一直在构建工具(DMS Software Reengineering Toolkit)来编写通用程序自 1995 年以来,在强大的计算机科学家团队的支持下,进行了操纵(语言翻译是一个特例)。 DMS 提供通用解析、AST 构建、符号表、控制和数据流分析、翻译规则的应用、带注释的源文本重新生成等,所有这些都通过计算机语言的显式定义进行参数化。

完成此操作所需的机器数量是巨大的(特别是如果您希望能够以通用方式对多种语言执行此操作),然后您需要可靠的解析器来处理具有不可靠定义的语言(PHP 就是一个完美的例子)。

你考虑构建一个语言到语言的翻译器或尝试它并没有什么问题,但我认为你会发现对于真正的语言来说这是一个比你预期的更大的任务。我们仅在 DMS 上就投入了约 100 个人年,在每种“可靠”语言定义(包括我们为 PHP 辛苦构建的语言定义)上又投入了 6-12 个月的时间,对于 C++ 等讨厌的语言则投入更多。这将是一次“地狱般的学习经历”;这对我们来说是好事。 (您可能会发现上述网站上的技术论文部分对于快速开始学习很有趣)。

人们经常尝试从他们熟悉的某些技术开始构建某种通用机械,这些技术可以完成部分工作。 (Python AST 就是一个很好的例子)。好消息是部分工作已经完成。坏消息是,机器内置了无数的假设,其中大部分假设只有在你尝试让它做其他事情时才会发现。那时你会发现这台机器是用来做它最初做的事情的,并且会非常非常抵制你让它做其他事情的尝试。 (我怀疑尝试使用 Python AST 来模拟 PHP 将会很有趣)。

我最初开始构建 DMS 的原因是为了构建很少内置此类假设的基础。它有一些让我们头疼的问题。到目前为止,还没有黑洞。 (过去 15 年我工作中最困难的部分就是努力防止此类假设的出现)。

许多人还错误地认为,如果他们能够解析(并且可能获得 AST),他们就可以很好地完成一些复杂的事情。惨痛的教训之一是您需要符号表和流程分析来进行良好的程序分析或转换。 AST 是必要的,但还不够。这就是 Aho&Ullman 的编译器书没有停在第 2 章的原因。(OP 拥有这项权利,因为他计划在 AST 之外构建额外的机制)。有关此主题的更多信息,请参阅解析后的生命

“我不需要完美的翻译”这句话很麻烦。弱翻译者所做的就是转换“简单”的 80% 代码,而将困难的 20% 留给手工完成。如果您打算转换的应用程序非常小,并且您只想将其转换一次,那么 20% 就可以了。如果您想要转换许多应用程序(甚至是同一个应用程序,随着时间的推移进行微小的更改),这并不好。如果您尝试转换 100K SLOC,那么 20% 就是 20,000 行原始代码,这些代码在您已经不理解的另外 80,000 行翻译程序的上下文中很难翻译、理解和修改。这需要付出巨大的努力。在百万行级别,这在实践中根本是不可能的。 有些人不信任自动化工具,并坚持手工翻译数百万行系统;这甚至更难,而且他们通常会因长时间延迟、高成本和经常彻底失败而痛苦地发现。)

(令人惊讶的是, 翻译大型系统必须争取百分之九十的高转化率,否则您很可能无法完成翻译活动的手动部分。

另一个关键考虑因素是要翻译的代码的大小。即使使用良好的工具,构建一个可用的、强大的翻译器也需要花费大量的精力。虽然构建一个翻译器而不是简单地进行手动转换似乎很性感且很酷,但对于小型代码库(例如,根据我们的经验,最多约 100K SLOC),从经济角度来看根本不合理。没有人喜欢这个答案,但如果你真的只需要翻译 10K SLOC 代码,那么你最好还是硬着头皮去做。是的,这很痛苦。

我认为我们的工具非常好(但是,我有很大的偏见)。而培养一名优秀的翻译人员仍然非常困难;我们大约需要 1.5-2 人年的时间,并且我们知道如何使用我们的工具。不同之处在于,有了这么多机器,我们成功的次数远多于失败的次数。

I've been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).

There's nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you'll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.

The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that's even harder and they normally find out painfully with long time delays, high costs and often outright failure.)

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.

Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don't justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that's painful.

I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.

淑女气质 2024-09-20 19:11:49

我的答案将解决解析 Python 以便将其翻译成另一种语言的具体任务,而不是 Ira 在他的答案中很好地解决的更高级别的方面。

简而言之:不要使用解析器模块,有一种更简单的方法。

自 Python 2.6 起提供的 ast 模块更适合您的需求,因为它为您提供了可以使用现成的 AST。我写了一篇关于此的文章 去年,不过简单来说就是使用ast的parse方法将Python源代码解析成AST。 parser 模块将为您提供一个解析树,而不是 AST。 要警惕差异

现在,由于 Python 的 AST 非常详细,因此给定 AST,前端工作并不是非常困难。我想您可以很快为部分功能准备好一个简单的原型。然而,获得完整的解决方案将需要更多时间,主要是因为语言的语义不同。语言的一个简单子集(函数、基本类型等)可以很容易地翻译,但是一旦进入更复杂的层,您将需要重型机器来在另一种语言中模拟一种语言的核心。例如,考虑 PHP 中不存在的 Python 生成器和列表推导式(据我所知,当涉及 PHP 时,这确实很差)。

最后给您一个提示,请考虑 Python 开发人员创建的 2to3 工具,用于将 Python 2 代码转换为 Python 3 代码。在前端方面,它拥有将 Python 转换为某种东西所需的大部分元素。然而,由于 Python 2 和 3 的核心相似,因此不需要模拟机制。

My answer will address the specific task of parsing Python in order to translate it to another language, and not the higher-level aspects which Ira addressed well in his answer.

In short: do not use the parser module, there's an easier way.

The ast module, available since Python 2.6 is much more suitable for your needs, since it gives you a ready-made AST to work with. I've written an article on this last year, but in short, use the parse method of ast to parse Python source code into an AST. The parser module will give you a parse tree, not an AST. Be wary of the difference.

Now, since Python's ASTs are quite detailed, given an AST the front-end job isn't terribly hard. I suppose you can have a simple prototype for some parts of the functionality ready quite quickly. However, getting to a complete solution will take more time, mainly because the semantics of the languages are different. A simple subset of the language (functions, basic types and so on) can be readily translated, but once you get into the more complex layers, you'll need heavy machinery to emulate one language's core in another. For example consider Python's generators and list comprehensions which don't exist in PHP (to my best knowledge, which is admittedly poor when PHP is involved).

To give you one final tip, consider the 2to3 tool created by the Python devs to translate Python 2 code to Python 3 code. Front-end-wise, it has most of the elements you need to translate Python to something. However, since the cores of Python 2 and 3 are similar, no emulation machinery is required there.

記憶穿過時間隧道 2024-09-20 19:11:49

编写翻译器并非不可能,尤其是考虑到 Joel 的实习生 花了一个夏天的时间完成了这件事。

如果你想做一种语言,这很容易。如果你想做更多,那就困难一点,但也不会太多。最困难的部分是,虽然任何图灵完备语言都可以做另一种图灵完备语言所做的事情,但内置数据类型可以显着改变语言的行为。

例如:

word = 'This is not a word'
print word[::-2]

需要很多的C++代码来复制(好吧,你可以使用一些循环结构来完成相当短的时间,但仍然如此)。

我想这有点题外话。

您是否曾经编写过基于语言语法的分词器/解析器?如果您还没有这样做,您可能会想了解如何做到这一点,因为这是该项目的主要部分。我要做的是提出一个基本的图灵完整语法 - 与 Python 非常相似的东西 字节码。然后,您创建一个采用语言语法的词法分析器/解析器(可能使用 BNF),并根据语法将语言编译成您的中间语言。然后你要做的就是做相反的事情 - 创建一个解析器,根据语法将你的语言转换为目标语言。

我看到的最明显的问题是,一开始您可能会创建可怕低效的代码,尤其是在Python等更强大*的语言中。

但如果你这样做,那么你可能能够找到优化输出的方法。总结一下:

  • 将提供的语法
  • 将程序编译为中间(但也是图灵完备)语法
  • 将中间程序编译为最终语言(基于提供的语法)
  • ...?
  • 利润!(?)

*强大我的意思是这需要 4 行:

myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]

向我展示另一种可以在 4 行内完成类似操作的语言,我将向您展示一种与 Python 一样强大的语言。

Writing a translator isn't impossible, especially considering that Joel's Intern did it over a summer.

If you want to do one language, it's easy. If you want to do more, it's a little more difficult, but not too much. The hardest part is that, while any turing complete language can do what another turing complete language does, built-in data types can change what a language does phenomenally.

For instance:

word = 'This is not a word'
print word[::-2]

takes a lot of C++ code to duplicate (ok, well you can do it fairly short with some looping constructs, but still).

That's a bit of an aside, I guess.

Have you ever written a tokenizer/parser based on a language grammar? You'll probably want to learn how to do that if you haven't, because that's the main part of this project. What I would do is come up with a basic Turing complete syntax - something fairly similar to Python bytecode. Then you create a lexer/parser that takes a language grammar (perhaps using BNF), and based on the grammar, compiles the language into your intermediate language. Then what you'll want to do is do the reverse - create a parser from your language into target languages based on the grammar.

The most obvious problem I see is that at first you'll probably create horribly inefficient code, especially in more powerful* languages like Python.

But if you do it this way then you'll probably be able to figure out ways to optimize the output as you go along. To summarize:

  • read provided grammar
  • compile program into intermediate (but also Turing complete) syntax
  • compile intermediate program into final language (based on provided grammar)
  • ...?
  • Profit!(?)

*by powerful I mean that this takes 4 lines:

myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]

Show me another language that can do something like that in 4 lines, and I'll show you a language that's as powerful as Python.

不即不离 2024-09-20 19:11:49

有几个答案告诉您不要打扰。那么,这有什么帮助呢?你想学吗?你可以学习。这是编译。碰巧你的目标语言不是机器代码,而是另一种高级语言。这一直都是这样做的。

有一个相对简单的入门方法。首先,获取 http://sourceforge.net/projects/lime-php/ (如果您想使用 PHP)或类似的工作,请查看示例代码。接下来,您可以使用一系列正则表达式编写词法分析器,并将标记提供给您生成的解析器。您的语义操作可以直接以另一种语言输出代码,也可以构建一些数据结构(想想对象,人),您可以对其进行处理和遍历以生成输出代码。

PHP 和 Python 很幸运,因为在很多方面它们是相同的语言,但语法不同。困难的部分是克服语法形式和数据结构之间的语义差异。例如,Python 有列表和字典,而 PHP 只有 assoc 数组。

“学习者”方法是构建一些适用于该语言的有限子集(例如仅打印语句、简单数学和变量赋值)的东西,然后逐步消除限制。这基本上就是该领域的“大人物”所做的事情。

哦,由于 Python 中没有静态类型,因此最好编写并依赖 PHP 函数,例如“python_add”,它根据 Python 的方式添加数字、字符串或对象。

显然,如果你允许的话,这个数字会变得更大。

There are a couple answers telling you not to bother. Well, how helpful is that? You want to learn? You can learn. This is compilation. It just so happens that your target language isn't machine code, but another high-level language. This is done all the time.

There's a relatively easy way to get started. First, go get http://sourceforge.net/projects/lime-php/ (if you want to work in PHP) or some such and go through the example code. Next, you can write a lexical analyzer using a sequence of regular expressions and feed tokens to the parser you generate. Your semantic actions can either output code directly in another language or build up some data structure (think objects, man) that you can massage and traverse to generate output code.

You're lucky with PHP and Python because in many respects they are the same language as each other, but with different syntax. The hard part is getting over the semantic differences between the grammar forms and data structures. For example, Python has lists and dictionaries, while PHP only has assoc arrays.

The "learner" approach is to build something that works OK for a restricted subset of the language (such as only print statements, simple math, and variable assignment), and then progressively remove limitations. That's basically what the "big" guys in the field all did.

Oh, and since you don't have static types in Python, it might be best to write and rely on PHP functions like "python_add" which adds numbers, strings, or objects according to the way Python does it.

Obviously, this can get much bigger if you let it.

苍景流年 2024-09-20 19:11:49

我会赞同 @EliBendersky 关于使用 ast.parse 而不是解析器的观点(我之前不知道)。我也强烈建议您查看他的博客。我使用 ast.parse 来做 Python->JavaScript 翻译器 (@https://bitbucket.org/amirouche/pythonium )。我通过某种程度上审查其他实现并自己尝试来提出 Pythonium 设计。我从我也开始的 https://github.com/PythonJS/PythonJS 分叉了 Pythonium,它实际上是完全重写。整体设计灵感来自 PyPy 和 http:// www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf 论文。

我尝试过的一切,从开始到最好的解决方案,即使它看起来像 Pythonium 营销,但实际上并非如此(如果某些内容对于网络礼仪来说似乎不正确,请毫不犹豫地告诉我):

  • 在中实现 Python 语义使用原型继承的普通旧 JavaScript:据我所知,使用 JS 原型对象系统不可能实现 Python 多重继承。后来我确实尝试使用其他技巧来做到这一点(参见 getattribute)。据我所知,JavaScript 中没有 Python 多重继承的实现,最好的实现是单继承 + mixins,我不确定它们是否处理菱形继承。有点类似于 Skulpt,但没有 google clojure。

  • 我尝试使用 Google clojure,就像 Skulpt(编译器)一样,而不是实际阅读 Skulpt 代码 #fail。无论如何,因为基于 JS 原型的对象系统仍然是不可能的。创建绑定非常非常困难,您需要编写 JavaScript 和大量样板代码(参见 https ://github.com/skulpt/skulpt/issues/50 我是幽灵)。当时还没有明确的方法将绑定集成到构建系统中。我认为 Skulpt 是一个库,您只需将 .py 文件包含在要执行的 html 中,开发人员不需要完成编译阶段。

  • 尝试过 pyjaco(编译器),但是创建绑定(从 Python 代码调用 Javascript 代码)非常困难,每次都要创建太多的样板代码。现在我认为 pyjaco 是更接近 Pythonium 的一个。 pyjaco 是用 Python 编写的(也是 ast.parse),但很多都是用 JavaScript 编写的,并且它使用原型继承。

我从未真正成功地运行 Pyjamas #fail,也从未尝试再次读取代码 #fail。但在我看来,pyjamas 是在进行 API->API 翻译(或框架到框架),而不是 Python 到 JavaScript 的翻译。 JavaScript 框架使用页面中已有的数据或来自服务器的数据。 Python 代码只是“管道”。后来我发现pyjamas其实是一个真正的python->js翻译器。

我仍然认为可以进行 API->API(或框架->框架)转换,这基本上就是我在 Pythonium 中所做的,但级别较低。可能 Pyjamas 使用与 Pythonium 相同的算法...

然后我发现 brython 完全用 Javascript 编写,如 Skulpt,不需要编译和很多废话...但用 JavaScript 编写。

自从在这个项目过程中编写第一行代码以来,我就了解了 PyPy,甚至了解了 PyPy 的 JavaScript 后端。是的,如果你找到它,你可以直接从 PyPy 生成 JavaScript 中的 Python 解释器。人们说,这是一场灾难。我没有读到为什么。但我认为原因是他们用来实现解释器的中间语言 RPython 是 Python 的一个子集,专门用于翻译为 C(也可能是 asm)。 Ira Baxter 说,当你构建某些东西时,你总是会做出假设,并且可能你会对其进行微调,使其在 PyPy 的情况下达到最佳效果:Python->C 翻译。这些假设在另一种情况下可能不相关,更糟糕的是它们可以推断开销,否则说直接翻译很可能总是更好。

用 Python 编写解释器听起来是一个(非常)好的主意。但出于性能原因,我对编译器更感兴趣,而且实际上将 Python 编译为 JavaScript 比解释它更容易。

我开始使用 PythonJS 的想法是将 Python 的一个子集组合在一起,以便我可以轻松地将其转换为 JavaScript。起初,由于过去的经验,我什至没有费心去实现OO系统。我实现的可转换为 JavaScript 的 Python 子集是:

  • 在定义和调用中具有完整参数语义的函数。这是我最自豪的部分。
  • while/if/elif/else
  • Python 类型已转换为 JavaScript 类型(没有任何类型的 Python 类型)
  • for 只能迭代 Javascript 数组(对于 in 数组)
  • 对 JavaScript 的透明访问:如果您在 Python 代码中编写 Array它将在 javascript 中被转换为数组。这是在可用性方面优于竞争对手的最大成就。
  • 您可以将 Python 源代码中定义的函数传递给 javascript 函数。将考虑默认参数。
  • 它添加了名为 new 的特殊函数,该函数被翻译为 JavaScript new,例如:new(Python)(1, 2, spam, "egg") 被翻译为“new Python(1, 2, spam, "egg")。
  • ”var “由翻译器自动处理。(来自 Brett(PythonJS 贡献者)的非常好的发现。
  • 全局关键字
  • 闭包
  • lambdas
  • 列表推导式
  • 导入通过 requirejs
  • 单类继承支持 + 通过 classyjs 混合

这看起来很多,但实际上与成熟的相比非常狭窄Python 的语义。

生成的 JS 是完美的,即没有开销,无法通过进一步编辑来提高性能。如果可以改进生成的代码,则可以做到。此外,编译器不依赖于 http: 编写的 .js 中的任何 JS 技巧。 //superherojs.com/,所以它的可读性很强。PythonJS

这部分的直接后代是 Pythonium Veloce 模式。完整的实现可以在 @ https:// /bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master 793 SLOC + 与其他翻译器共享代码的大约 100 SLOC。

pystones.py 的改编版本可以在 Veloce 模式下进行翻译(参见)。 https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369 AFD746C25D7 /pystone/?at=master

设置基本的 Python->JavaScript 翻译后,我选择了另一条路径将完整的 Python 翻译为 JavaScript。 Glib 执行面向对象的基于类的代码的方式(目标语言是 JS),因此您可以访问数组、类似映射的对象和许多其他技巧,所有这些部分都是用 Python 编写的。 IIRC Pythonium 翻译器中没有编写 JavaScript 代码。获得单一继承并不困难,困难的部分是使 Pythonium 完全兼容 Python:

  • Python 中的 spam.egg 总是被翻译为 getattribute(spam, "egg") I没有特别介绍这一点,但我认为它浪费了很多时间,而且我不确定我可以使用 asm.js 或其他任何东西来改进它。
  • 方法解析顺序:即使使用用 Python 编写的算法,将其转换为 Python Veloce 兼容代码也是一项艰巨的任务。
  • getattributre:实际的 getattribute 解析算法有点棘手,它仍然不支持
  • 基于数据描述符元类:我知道在哪里插入代码,但仍然......
  • 最后一个重要的是:some_callable (...) 始终转换为“call(some_callable)”。据我所知,翻译器根本不使用推理,因此每次调用时,您都需要检查它是哪种对象,并按照其应有的方式调用它。

这部分包含在 https://bitbucket .org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/compile/runtime.py?at=master 它是用与 Python Veloce 兼容的 Python 编写的。

实际的合规翻译器 https://bitbucket。 org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/driven/company.py?at=master 不会直接生成 JavaScript 代码,最重要的是不会进行 ast->ast 转换。我尝试了 ast->ast 的东西,即使 ast 比 cst 更好,即使使用 ast.NodeTransformer 也不好使用,更重要的是我不需要做 ast->ast 。

在我的情况下,对 python ast 执行 python ast 至少可能会提高性能,因为我有时会在生成与其关联的代码之前检查块的内容,例如:

  • var/global:能够 var 我必须知道的东西我需要什么,不需要什么。我没有生成一个块来跟踪在给定块中创建了哪个变量并将其插入到生成的功能块的顶部,而是在实际访问子节点以生成关联代码之前进入该块时查找相关的变量分配。
  • 到目前为止,生成器在 JS 中有一种特殊的语法,所以当我想编写“var my_generator = function”时,我需要知道哪个 Python 函数是生成器

所以我并没有真正为每个节点访问一次翻译阶段。

整个过程可以描述为:

Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code

Python 内置函数是用 Python 代码编写的(!),IIRC 有一些与引导类型相关的限制,但您可以访问可以在兼容模式下翻译 Pythonium 的所有内容。看看 https://bitbucket.org /amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/driven/builtins/?at=master

阅读从 pythonium 兼容生成的 JS 代码可以理解,但源映射会有很大帮助。

根据这次经验,我可以给你的宝贵建议是老生常谈:

  • 在文献和现有的闭源或免费项目中广泛审查该主题。当我回顾不同的现有项目时,我应该给予它更多的时间和动力。
  • 问问题!如果我事先知道 PyPy 后端由于 C/Javascript 语义不匹配造成的开销而毫无用处。我可能在 6 个月前或者 3 年前就有了 Pythonium 的想法。
  • 知道自己想做什么,有目标。对于这个项目,我有不同的目标:练习一点 JavaScript,了解更多 Python 知识,并能够编写在浏览器中运行的 Python 代码(更多内容见下文)。
  • 失败是经验
  • 一小步是一步
  • 开始小
  • 梦想大
  • 做demo
  • 迭代

只用Python Veloce模式,我很高兴!但一路走来,我发现我真正想要的是将我和其他人从 Javascript 中解放出来,但更重要的是能够以一种舒适的方式创造。这引导我走向方案、DSL、模型以及最终领域特定模型(参见 http://dsmforum.org/ )。

关于艾拉·巴克斯特的回应:

这些估计根本没有帮助。我花了大约 6 个月的空闲时间来学习 PythonJS 和 Pythonium。所以我可以对六个月的全职工作抱有更多的期望。我想我们都知道 100 人年在企业环境中意味着什么,但根本不意味着什么……

当有人说某件事很难或更经常是不可能时,我回答说“只需要时间来找到问题的解决方案”那是不可能的”,否则说没有什么是不可能的,除非在这种情况下被证明是不可能的数学证明......

如果它没有被证明不可能,那么它就留下了想象的空间:

  • 找到一个证明证明它是不可能的

  • 如果不可能,则可能有一个“劣等”问题是可以解决的。

或者

  • 如果不是不可能,找到解决方案

这不仅仅是乐观的想法。当我开始使用 Python->Javascript 时,每个人都说这是不可能的。 PyPy 不可能。元类太难了。等等...我认为 PyPy 相对于 Scheme->C 论文(已经有 25 年历史)的唯一革命是一些自动 JIT 生成(我认为基于 RPython 解释器中编写的提示)。

大多数说某件事“困难”或“不可能”的人都没有提供理由。 C++很难解析?我知道,它们仍然是(免费的)C++ 解析器。罪恶在于细节?我知道。光说不可能是没有帮助的,它比“没有帮助”更令人沮丧,而且有些人故意要劝阻其他人。我通过 听说了这个问题https://stackoverflow.com/questions/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus

对你来说完美是什么?这就是您定义下一个目标并可能实现总体目标的方式。

我更想知道我可以执行哪些类型的模式
放在代码上,以便更容易翻译(即:IoC、SOA?)代码
而不是如何进行翻译。

我认为没有任何模式不能至少以不完美的方式从一种语言翻译成另一种语言。由于语言到语言的翻译是可能的,因此您最好首先以此为目标。因为,我认为根据 http://en.wikipedia.org/wiki/Graph_isomorphism_problem,两种计算机语言之间的翻译是树或DAG同构。即使我们已经知道它们都是图灵完备的,所以......

框架 - >框架,我更好地将其可视化为API - > API翻译可能仍然是你可能会记住的东西,作为改进生成代码的一种方法。例如:Prolog 是非常具体的语法,但您仍然可以通过在 Python 中描述相同的图形来进行类似 Prolog 的计算...如果我要实现一个 Prolog 到 Python 的转换器,我不会在 Python 中实现统一,而是在 C 库中实现提供了对于 Python 爱好者来说非常易读的“Python 语法”。最后,语法只是我们赋予其含义的“绘画”(这就是我开始计划的原因)。邪恶在于语言的细节,我不是在谈论语法。语言 getattribute 挂钩中使用的概念(没有它也可以),但所需的 VM 功能(如尾递归优化)可能很难处理。您不关心初始程序是否不使用尾递归,即使目标语言中没有尾递归,您也可以使用 greenlets/事件循环来模拟它。

对于目标语言和源语言,请寻找:

  • 大而具体的想法
  • 微小而共同的共享想法

由此将出现:

  • 易于翻译的内容
  • 难以翻译的内容

您还可能能够知道什么将被快速翻译慢代码。

还有 stdlib 或任何库的问题,但没有明确的答案,这取决于您的目标。

惯用代码或可读的生成代码也有解决方案...

针对 PHP 等平台比针对浏览器更容易,因为您可以提供慢速和/或关键路径的 C 实现。

鉴于您的第一个项目是将 Python 翻译为 PHP,至少对于我所知道的 PHP3 子集来说,自定义 veloce.py 是您最好的选择。如果您可以为 PHP 实现 veloce.py,那么您可能能够运行兼容模式...此外,如果您可以将 PHP 转换为可以使用 php_veloce.py 生成的 PHP 子集,这意味着您可以将 PHP 转换为veloce.py 可以使用的 Python 子集,这意味着您可以将 PHP 转换为 Javascript。只是说...

您还可以查看这些库:

你也可以对这篇博文(和评论)感兴趣:https: //www.rfk.id.au/blog/entry/pypy-js-poc-jit/

I will second @EliBendersky point of view regarding using ast.parse instead of parser (which I did not know about before). I also warmly recommend you to review his blog. I used ast.parse to do Python->JavaScript translator (@https://bitbucket.org/amirouche/pythonium). I've come up with Pythonium design by somewhat reviewing other implementations and trying them on my own. I forked Pythonium from https://github.com/PythonJS/PythonJS which I also started, It's actually a complete rewrite . The overall design is inspired from PyPy and http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf paper.

Everything I tried, from beginning to the best solution, even if it looks like Pythonium marketing it really isn't (don't hesitate to tell me if something doesn't seem correct to the netiquette):

  • Implement Python semantic in Plain Old JavaScript using prototype inheritance: AFAIK it's impossible to implement Python multiple inheritance using JS prototype object system. I did try to do it using other tricks later (cf. getattribute). As far as I know there is no implementation of Python multiple inheritance in JavaScript, the best that exists is Single inhertance + mixins and I'm not sure they handle diamond inheritance. Kind of similar to Skulpt but without google clojure.

  • I tried with Google clojure, just like Skulpt (compiler) instead of actually reading Skulpt code #fail. Anyway because of JS prototype based object system still impossible. Creating binding was very very difficult, you need to write JavaScript and a lot of boilerplate code (cf. https://github.com/skulpt/skulpt/issues/50 where I am the ghost). At that time there was no clear way to integrate the binding in the build system. I think that Skulpt is a library and you just have to include your .py files in the html to be executed, no compilation phase required to be done by the developer.

  • Tried pyjaco (compiler) but creating bindings (calling Javascript code from Python code) was very difficult, there was too much boilerplate code to create every time. Now I think pyjaco is the one that more near Pythonium. pyjaco is written in Python (ast.parse too) but a lot is written in JavaScript and it use prototype inheritance.

I never actually succeed at running Pyjamas #fail and never tried to read the code #fail again. But in my mind PyJamas was doing API->API tranlation (or framework to framework) and not Python to JavaScript translation. The JavaScript framework consume data that is already in the page or data from the server. Python code is only "plumbing". After that I discovered that pyjamas was actually a real python->js translator.

Still I think it's possible to do API->API (or framework->framework) translation and that's basicly what I do in Pythonium but at lower level. Probably Pyjamas use the same algorithm as Pythonium...

Then I discovered brython fully written in Javascript like Skulpt, no need for compilation and lot of fluff... but written in JavaScript.

Since the initial line written in the course of this project, I knew about PyPy, even the JavaScript backend for PyPy. Yep, you can, if you find it, directly generate a Python interpreter in JavaScript from PyPy. People say, it was a disaster. I read no where why. But I think the reason is that the intermediate language they use to implement the interpreter, RPython, is a subset of Python tailored to be translated to C (and maybe asm). Ira Baxter says you always make assumptions when you build something and probably you fine tune it to be the best at what it's meant to do in the case of PyPy: Python->C translation. Those assumptions might not be relevant in another context worse they can infere overhead otherwise said direct translation will most likely always be better.

Having the interpreter written in Python sounded like a (very) good idea. But I was more interested in a compiler for performance reasons also it's actually more easy to compile Python to JavaScript than interpret it.

I started PythonJS with the idea of putting together a subset of Python that I could easily translate to JavaScript. At first I didn't even bother to implement OO system because of past experience. The subset of Python that I achieved to translate to JavaScript are:

  • function with full parameters semantic both in definition and calling. This is the part I am most proud of.
  • while/if/elif/else
  • Python types were converted to JavaScript types (there is no python types of any kind)
  • for could iterate over Javascript arrays only (for a in array)
  • Transparent access to JavaScript: if you write Array in the Python code it will be translated to Array in javascript. This is the biggest achievement in terms of usability over its competitors.
  • You can pass function defined in Python source to javascript functions. Default arguments will be taken into account.
  • It add has special function called new which is translated to JavaScript new e.g: new(Python)(1, 2, spam, "egg") is translated to "new Python(1, 2, spam, "egg").
  • "var" are automatically handled by the translator. (very nice finding from Brett (PythonJS contributor).
  • global keyword
  • closures
  • lambdas
  • list comprehensions
  • imports are supported via requirejs
  • single class inheritance + mixin via classyjs

This seems like a lot but actually very narrow compared to full blown semantic of Python. It's really JavaScript with a Python syntax.

The generated JS is perfect ie. there is no overhead, it can not be improved in terms of performance by further editing it. If you can improve the generated code, you can do it from the Python source file too. Also, the compiler did not rely on any JS tricks that you can find in .js written by http://superherojs.com/, so it's very readable.

The direct descendant of this part of PythonJS is the Pythonium Veloce mode. The full implementation can be found @ https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master 793 SLOC + around 100 SLOC of shared code with the other translator.

An adapted version of pystones.py can be translated in Veloce mode cf. https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pystone/?at=master

After having setup basic Python->JavaScript translation I choosed another path to translate full Python to JavaScript. The way of glib doing object oriented class based code except the target language is JS so you have access to arrays, map-like objects and many other tricks and all that part was written in Python. IIRC there is no javascript code written by in Pythonium translator. Getting single inheritance is not difficult here are the difficult parts making Pythonium fully compliant with Python:

  • spam.egg in Python is always translated to getattribute(spam, "egg") I did not profile this in particular but I think that where it loose a lot of time and I'm not sure I can improve upon it with asm.js or anything else.
  • method resolution order: even with the algorithm written in Python, translating it to Python Veloce compatible code was a big endeavour.
  • getattributre: the actual getattribute resolution algorithm is kind of tricky and it still doesn't support data descriptors
  • metaclass class based: I know where to plug the code, but still...
  • last bu not least: some_callable(...) is always transalted to "call(some_callable)". AFAIK the translator doesn't use inference at all, so every time you do a call you need to check which kind of object it is to call it they way it's meant to be called.

This part is factored in https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/runtime.py?at=master It's written in Python compatible with Python Veloce.

The actual compliant translator https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/compliant.py?at=master doesn't generate JavaScript code directly and most importantly doesn't do ast->ast transformation. I tried the ast->ast thing and ast even if nicer than cst is not nice to work with even with ast.NodeTransformer and more importantly I don't need to do ast->ast.

Doing python ast to python ast in my case at least would maybe be a performance improvement since I sometime inspect the content of a block before generating the code associated with it, for instance:

  • var/global: to be able to var something I must know what I need to and not to var. Instead of generating a block tracking which variable are created in a given block and inserting it on top of the generated function block I just look for revelant variable assignation when I enter the block before actually visiting the child node to generate the associated code.
  • yield, generators have, as of yet, a special syntax in JS, so I need to know which Python function is a generator when I want to write the "var my_generator = function"

So I don't really visit each node once for each phase of the translation.

The overall process can be described as:

Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code

Python builtins are written in Python code (!), IIRC there is a few restrictions related to bootstraping types, but you have access to everything that can translate Pythonium in compliant mode. Have a look at https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/builtins/?at=master

Reading JS code generated from pythonium compliant can be understood but source maps will greatly help.

The valuable advice I can give you in the light of this experience are kind old farts:

  • extensively review the subject both in literature and existing projects closed source or free. When I reviewed the different existing projects I should have given it way more time and motivation.
  • ask questions! If I knew beforehand that PyPy backend was useless because of the overhead due to C/Javascript semantic mismatch. I would maybe had Pythonium idea way before 6 month ago maybe 3 years ago.
  • know what you want to do, have a target. For this project I had different objectives: pratice a bit a javascript, learn more of Python and be able to write Python code that would run in the browser (more and that below).
  • failure is experience
  • a small step is a step
  • start small
  • dream big
  • do demos
  • iterate

With Python Veloce mode only, I'm very happy! But along the way I discovered that what I was really looking for was liberating me and others from Javascript but more importantly being able to create in a comfortable way. This lead me to Scheme, DSL, Models and eventually domain specific models (cf. http://dsmforum.org/).

About what Ira Baxter response:

The estimations are not helpful at all. I took me more or less 6 month of free time for both PythonJS and Pythonium. So I can expect more from full time 6 month. I think we all know what 100 man-year in an enterprise context can mean and not mean at all...

When someone says something is hard or more often impossible, I answer that "it only takes time to find a solution for a problem that is impossible" otherwise said nothing is impossible except if it's proven impossible in this case a math proof...

If it's not proven impossible then it leaves room for imagination:

  • finding a proof proving it's impossible

and

  • If it is impossible there may be an "inferior" problem that can have a solution.

or

  • if it's not impossible, finding a solution

It's not just optimistic thinking. When I started Python->Javascript everybody was saying it was impossible. PyPy impossible. Metaclasses too hard. etc... I think that the only revolution that brings PyPy over Scheme->C paper (which is 25 years old) is some automatic JIT generation (based hints written in the RPython interpreter I think).

Most people that say that a thing is "hard" or "impossible" don't provide the reasons. C++ is hard to parse? I know that, still they are (free) C++ parser. Evil is in the detail? I know that. Saying it's impossible alone is not helpful, It's even worse than "not helpful" it's discouraging, and some people mean to discourage others. I heard about this question via https://stackoverflow.com/questions/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus.

What would be perfection for you? That's how you define next goal and maybe reach the overall goal.

I am more interested in knowing what kinds of patterns I could enforce
on the code to make it easier to translate (ie: IoC, SOA ?) the code
than how to do the translation.

I see no patterns that can not be translated from one language to another language at least in a less than perfect way. Since language to language translation is possible, you'd better aim for this first. Since, I think according to http://en.wikipedia.org/wiki/Graph_isomorphism_problem, translation between two computer languages is a tree or DAG isomorphism. Even if we already know that they are both turing complete, so...

Framework->Framework which I better visualize as API->API translation might still be something that you might keep in mind as a way to improve the generated code. E.g: Prolog as very specific syntax but still you can do Prolog like computation by describing the same graph in Python... If I was to implement a Prolog to Python translator I wouldn't implement unification in Python but in a C library and come up with a "Python syntax" that is very readable for a Pythonist. In the end, syntax is only "painting" for which we give a meaning (that's why I started scheme). Evil is in the detail of the language and I'm not talking about the syntax. The concepts that are used in the language getattribute hook (you can live without it) but required VM features like tail-recursion optimisation can be difficult to deal with. You don't care if the initial program doesn't use tail recursion and even if there is no tail recursion in the target language you can emulate it using greenlets/event loop.

For target and source languages, look for:

  • Big and specific ideas
  • Tiny and common shared ideas

From this will emerge:

  • Things that are easy to translate
  • Things that are difficult to translate

You will also probably be able to know what will be translated to fast and slow code.

There is also the question of the stdlib or any library but there is no clear answer, it depends of your goals.

Idiomatic code or readable generated code have also solutions...

Targeting a platform like PHP is much more easy than targeting browsers since you can provide C-implementation of slow and/or critical path.

Given you first project is translating Python to PHP, at least for the PHP3 subset I know of, customising veloce.py is your best bet. If you can implement veloce.py for PHP then probably you will be able to run the compliant mode... Also if you can translate PHP to the subset of PHP you can generate with php_veloce.py it means that you can translate PHP to the subset of Python that veloce.py can consume which would mean that you can translate PHP to Javascript. Just saying...

You can also have a look at those libraries:

Also you might be interested by this blog post (and comments): https://www.rfk.id.au/blog/entry/pypy-js-poc-jit/

撩起发的微风 2024-09-20 19:11:49

你可以看一下 Vala 编译器,它翻译了 Vala(一个 C#-像语言)转化为C。

You could take a look at the Vala compiler, which translates Vala (a C#-like language) into C.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文