有助于重构的编程语言属性?

发布于 2024-07-23 10:42:38 字数 1140 浏览 10 评论 0原文

促进(简化)广泛自动化的源代码分析和重新设计(转换)工具的开发的编程语言有哪些共同特征/属性?

我主要考虑的是编程语言的特性,这些特性使开发静态分析和重构工具变得更容易(即比较Java与C++,前者对重构有更好的支持)。

换句话说,如果一种编程语言从一开始就被明确设计为支持自动静态分析和重构,那么它最好具有哪些特征?

例如,对于 Ada,有 ASIS

Ada 语义接口规范 (ASIS) 是一个分层的开放式架构,提供对 Ada 库环境的独立于供应商的访问。 它允许对 Ada 程序和库进行静态分析。 ASIS(Ada 语义接口规范)是一个库,使应用程序可以访问 Ada 编译单元的完整语法和语义结构。 该库通常由需要对 Ada 程序执行某种静态分析的工具使用。

ASIS 信息: ASIS 为工具提供了提取最佳数据的标准方法由 Ada 编译器或其他源代码分析器收集。 使用 ASIS 的工具本身是用 Ada 编写的,并且可以很容易地在支持 ASIS 的 Ada 编译器之间移植。 使用 ASIS,开发人员可以创建具有高度可移植性的强大代码分析工具。 它们还可以节省实现从源程序中提取语义信息的算法的大量费用。 例如,ASIS 工具已经存在,可以生成源代码指标、检查程序是否符合编码风格或限制、进行交叉引用以及全局分析程序以进行验证和验证。

另请参阅ASIS 常见问题解答

您能想到其他提供类似全面功能的编程语言吗?以及专门用于分析/转换目的的处理源代码的完整接口?

我正在考虑提供低级挂钩的具体实现技术,例如提供在运行时检查 AST 或 ASG 的方法的核心库函数。

What are common traits/properties of programming languages that facilitate (simplify) the development of widely automated source code analysis and re-engineering (transformation) tools?

I am mostly thinking in terms of programming language features that make it easier to develop static analysis and refactoring tools (i.e. compare Java vs. C++, the former of which has better support for refactoring).

In other words, a programming language that would be explicitly designed to provide support for automated static analysis and refactoring right from the beginning, what characteristics would it preferably feature?

For example, for Ada, there's the ASIS:

The Ada Semantic Interface Specification (ASIS) is a layered, open architecture providing vendor-independent access to the Ada Library Environment. It allows for the static analysis of Ada programs and libraries.
ASIS, the Ada Semantic Interface Specification, is a library that gives applications access to the complete syntactic and semantic structure of an Ada compilation unit. This library is typically used by tools that need to perform some sort of static analysis on an Ada program.

ASIS information: ASIS provides a standard way for tools to extract data that are best collected by an Ada compiler or other source code analyzer. Tools which use ASIS are themselves written in Ada, and can be very easily ported between Ada compilers which support ASIS. Using ASIS, developers can produce powerful code analysis tools with a high degree of portability. They can also save the considerable expense of implementing the algorithms that extract semantic information from the source program. For example, ASIS tools already exist that generate source-code metrics, check a program's conformance to coding styles or restrictions, make cross-references, and globally analyze programs for validation and verification.

Also see, ASIS FAQ

Can you think of other programming languages that provide a similarly comprehensive and complete interface to working with source code specifically for analysis/transformation purposes?

I am thinking about specific implementation techniques to provide the low level hooks, for example core library functions that provide a way to inspect an AST or ASG at runtime.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

春夜浅 2024-07-30 10:42:39

反射内置于语言/类型系统中。 这使得静态分析和重构变得不那么痛苦。

这就是 Java 和 .NET 工具如此普遍和优秀的部分原因。 这为工具提供了更好的功能,可以快速可靠地理解源代码的依赖性,这有助于对源代码进行静态分析。

此外,您还可以对编译后的代码进行分析。

Reflection built into the language/type system. This makes static analysis and refactoring much less painful.

This is part of why Java and .NET tools are so commonplace and nice. This provides the tools with much better functionality in terms of understanding depdencies of source code quickly and reliably, which helps with the static analysis of source code.

In addition, you get the ability to do analysis of your compiled code, as well.

故事还在继续 2024-07-30 10:42:39

有一种语言共享“代码就是数据”范式。 例如,就该语言而言,每一行代码都只是数据。 这使得重构成为与原始数据操作一样的基本操作。 这种语言的名字是Lisp。 ;)

认真地说,“编程语言”和“机器语言”是两个不同的要求。 对于程序员来说,完美的分析语言可能是噩梦。 更重要的是,为某些分析而设计的语言可能根本不是编程语言。 (上周我遇到了指针分析的语言,它没有文本表示,只有两个可执行语句)

再说一遍:首先你必须定义任务,然后解决它。 例如:如果任务是“我想编写安全的程序,例如我想确保我永远不会尝试混合整数和字符操作数”,那么您需要一种具有静态类型的语言。 好的,“我需要在运行时知道我可以使用外部库做什么” - 反射是你的选择。 “我需要通用编程语言来进行交换、转换和分析”——很可能,这不是您真正想要的。

There is a language sharing "code is data" paradigm. E.g. every line of code is just data in terms of this language. This make refactoring to be as basic action as primitive data operations. And the name of this language is Lisp. ;)

Seriously speaking, "language for programming" and "language for machine" are two different requirements. And a perfect language for analyzing could be nightmare for programmer. Even more, language designed for some analysis could be not programming language at all. (Last week I met the language for pointer analysis, and it has no textual representation and only two executable statements)

And again: first you have to define the task and then solve it. For example: if the task is "I want to write safe programs, e.g. I want to be sure that I will never try to mix integral and character operands", then you need a language with static types. Ok, "I need to know at runtime what I can do with external libraries" - reflection is your choice. "I need universal programming language for interchanging, transformations and analysis" - most likely, this is not what you really want.

青衫儰鉨ミ守葔 2024-07-30 10:42:39

对于重构:自相似性

能够接受代码移植而无需侵入性更改或奇怪的重新解释。 示例:

  • 通过使用引用参数来修改对变量的访问,将 C++ 代码片段提取到新过程中。
  • Python、Javascript 和 Lua 方法实际上只是具有“self”参数的函数。 *
  • 在任意数量的语言中,创建/填充结构的函数都可以(或多或少)转换为构造函数。

反例...

  • Ruby(模块、类)、方法 lambda 块和原始块:语义上的差异至少可以说是令人困惑的。 (这是我觉得有资格肯定地说的。)

对于(在我看来)完全不同的自动损坏情况,我不太确定,但函数式编程语言确实提供了免于副作用的自由。 (好吧,那么我们如何才能用一种语言为我们其他人提供同样的东西呢?)

* Python几乎就是这样。 (我忘记了问题是什么。如果方法是在类中定义的或嫁接在运行时上,可能与方法有关。)

For refactoring: self-similarity

The ability to accept code transplants without intrusive alteration or bizarre reinterpretation. Examples:

  • Extract a snippet of C++ to a new procedure, by using reference parameters to give it modifing access to variables.
  • Python, Javascript and Lua methods really are just functions that have a 'self' parameter. *
  • In any number of languages, a function that creates/populates a struct can be (more or less trivially) converted to a constructor.

Counterexamples...

  • Ruby (modules, classes), methods lambda block and raw blocks: The differences in semantics are bewildering to say the least. (which is all I feel qualified to say for sure.)

For the (to my mind) wildly different case of automatic mangling I'm a lot less sure, but the freedom from side-effects offered by functional programming languages is really it. (Ok, so how could we offer the same thing in a language for the rest of us?)

* Python is almost like that. (I forgot what the gotcha is. Probably something to with if method was defined in class or grafted on, runtime.)

青巷忧颜 2024-07-30 10:42:39

IMO 最重要的属性是语言是完全指定的和确定性的。 例如,在 C 语言中,以下代码的行为不是由语言规范定义的:

x++ = x++ + ++x;

如果代码的行为未定义,但它编译并执行了某些操作,则没有安全的方法可以自动更改它(即重构它)以保留某些东西的方式。

下一个重要的属性是它不允许访问超出其范围的变量(字段)。 例如,在 C 语言中,指针可以通过“猜测”地址来访问任何变量的值。 在这样的语言中,有时无法判断代码中某个变量的值被读取和/或更改的位置。 同样,没有安全的方法来自动重构可能执行类似操作的程序。

IMO the most important property is that the language is completely specified and deterministic. For example, in C the behaviour of following code is not defined by the language specification:

x++ = x++ + ++x;

If the code's behaviour is undefined, but yet it compiles and does something, there is no safe way to automatically change it (i.e. refactor it) in a way that preserves that something.

The next important property is that it doesn't allow access to variables (fields) beyond its scope. Pointers make it possibe e.g. in C to access any variable's value simply by "guessing" the address. In a language like that, there are cases where it is not possible to tell where in the code a certain variable's value is read and/or changed. Again, there is no safe way to automatically refactor a program that might do something like that.

眼眸 2024-07-30 10:42:38

最大的必须是静态类型。 这使得工具能够更深入地了解代码正在做什么。 没有它,重构就会变得困难很多倍。

The biggest has to be static typing. This allows tools to have much more insight into what the code is doing. Without it refactoring becomes many times more difficult.

未央 2024-07-30 10:42:38

我认为这仍然是一个很大程度上尚未探索的问题。 “工具的语言设计”的概念似乎最近才进入主流的边缘,尽管我认为该领域的研究已有二十多年的历史。 我同意其他两个答案,即“静态类型”和“自相似性”是语言的有用属性,可以使重构支持变得更容易。

I think this is still a largely unexplored problem. The notion of "language design for tooling" seems to only have entered the fringes of the mainstream recently, though I think research in this area is more than two decades old. I agree with two of the other answers, namely that "static typing" and "self-similarity" are useful properties of a language to make refactoring support easier.

梦幻的味道 2024-07-30 10:42:38

确实,特定的编程语言可以使分析变得更容易。
如果您想要易于分析的语言,请选择纯函数式
一。

但在实践中没有人使用纯函数式语言进行编程。
(当哈斯克尔的人看到时,他们会跳上跳下
这个,但说真的,Haskell 很少被使用)。

使编程语言可分析的是基础设施
旨在支持分析
。 上面 Ada 的 ASIS 就是一个很好的例子。
不要混淆以下事实:ASIS 是为 Ada 编写的,或者是
用艾达写成; 重要的是有人认真想要
分析 Ada 并投入精力构建 Ada 分析
机械。

我相信正确的解决方法是建立通用分析基础设施
并将其分摊到多种语言中。 尽管
我们正在努力,我们应该建立通用的转型基础设施,
也是因为一旦你进行了分析,你就会想要使用
它来实现改变。 (看医生并不以诊断结束;
他们以治愈结束)。 我已经把我的职业生涯押在了上面。

结果是我认为非常适合分析的引擎,
重构、重新设计等:
我们的 DMS 软件工程工具包。

它具有通用解析、树构建、漂亮打印、
树操作、源到源重写、属性
语法评估、控制和数据流分析。
它具有多种广泛使用的方言的生产质量前端
C 和 C++,Java、C#、COBOL 和 PHP,甚至
对于 Verilog 和 VHDL(还有许多其他语言,
但还没有达到那个水平)。

为了让您了解它的实用性,它被用来
将 B-2 轰炸机的 JOVIAL 代码转换为 C...
我们从未见过源代码。
请参阅http://www.semdesigns.com/Products/Services/NorthropGrummanB2.html

现在,假设一个人有分析基础设施,什么语言
功能有帮助吗?

静态类型有助于限制变量可以采用的可能值集,
但只能通过添加有限的单参数谓词,例如“X 是整数”。
我认为更有帮助的是代码中的断言,因为它们捕获了
具有多个参数的谓词,它建立状态变量之间的关系,而这些关系通常无法通过检查找到
代码(例如问题或领域特定信息,例如“X > Y+3”。)
分析基础设施(坦率地说,是阅读代码的程序员)
理想情况下,可以利用这些额外的事实来提供更多信息
有效的分析。

此类断言通常使用特殊关键字进行编码,例如“assert”,
“pre(condition”和“post(condition”)的灵感有充分的理由
来自定理证明文献。

但即使你的语言中没有断言,它们也是
无论如何,很容易编码:只需编写一个 if 语句,其条件包含断言拒绝,并且主体执行调用惯用语的操作来指示
不可能或违反语言语义(例如,取消引用明显的空指针),
例如“if (x>0) failed();”

所以真正需要的不是语言中的断言,而是程序员
谁愿意写它们。 唉,这似乎是令人遗憾的缺乏。

It is true that the particular programming language can make analysis easier.
If you want the easist-to-analyze languages, pick a purely functional
one.

But nobody in practice programs in purely functional langauges.
(The Haskell guys are going to jump up and down when they see
this, but seriously, Haskell is used only extremely rarely).

What makes a programming language analyzable is infrastructure
designed to support analysis
. Ada's ASIS above, is a great example.
Don't confuse the fact that ASIS was written for Ada, or is
written in Ada; what counts is that somebody serious wanted
to analyze Ada and invested the effort to build Ada analysis
machinery.

I believe that the right cure is to build general analysis infrastructure
and amortize it across lots of languages. While
we're at it, we should build general transformation infrastructure,
too, because once you have an analysis, you'll want to use
it to effect change. (Doctor visits don't end with diagnosis;
they end with cures). And I've bet my career on it.

The result is an engine I think ideal for analysis,
refactoring, reengineering, etc:
our DMS Software Engineering Toolkit.

It has generic parsing, tree building, prettyprinting,
tree manipulation, source-to-source rewriting, attribute
grammar evaluations, control and data flow analysis.
It has production quality front ends for a number of widely used dialects
of C and C++, for Java, C#, COBOL, and PHP, and even
for Verilog and VHDL (many other langauges too,
but not quite at that level).

To give you some sense of its utility, it was used
to convert JOVIAL code for the B-2 bomber into C...
without us ever having seen the source code.
See http://www.semdesigns.com/Products/Services/NorthropGrummanB2.html

Now, assuming one has analysis infrastructure, what language
features help?

Static types helps by limiting the set of possible values a variable can take,
but only by adding a limited single-argument predicate, e.g., "X is an integer".
I think what helps more are assertions in the code because they capture
predicates with more than one argument, which establish relationships between state variables, that often cannot be found by inspecting
the code (e.g., problem or domain specific information, e.g., "X > Y+3".)
The analysis infrastructure (and frankly, the programmers that read the code)
can ideally take advantage of such additional facts to provide a more
effective analysis.

Such assertions are commonly coded with special keywords such as "assert",
"pre(condition" and "post(condition" that are inspired with good reason
from the theorem proving literature.

But even if you don't have assertions in your language, they are
easy to encode anyway: just write an if statement with the condition containing the assertion denial, and the body doing something that calls an idiom indicating
impossibility or violates the language semantics (e.g., deref an obviously null pointer),
such as "if (x>0) fail();"

So what's really needed isn't assertions in the language, but programmers
who are willing to write them
. Alas, that seems to be sadly lacking.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文