生成调用图的好算法?

发布于 2024-10-20 12:43:43 字数 449 浏览 5 评论 0原文

我正在编写一些代码来生成特定中间表示的调用图,无需通过静态扫描 IR 代码来执行它。 IR 代码本身并不太复杂,而且我对函数调用序列的样子有很好的了解,因此我需要做的就是跟踪调用。我目前正在以明显的方式这样做:

  • 跟踪我们在哪里
  • 如果我们遇到函数调用,则分支到该位置,执行并返回
  • 当分支在调用者和被调用者之间放置一条边时,

我对我得到的结果感到满意但我想确保我不会在这里重新发明轮子并面临极端情况。我想知道是否有任何公认的好的算法(和/或设计模式)可以有效地做到这一点?

更新: IR 代码是来自自制类 Java 语言的字节码反汇编,看起来像 Jasmine 规范

I am writing some code to generate call graphs for a particular intermediate representation without executing it by statically scanning the IR code. The IR code itself is not too complex and I have a good understanding of what function call sequences look like so all I need to do is trace the calls. I am currently doing it the obvious way:

  • Keep track of where we are
  • If we encounter a function call, branch to that location, execute and come back
  • While branching put an edge between the caller and the callee

I am satisfied with where I am getting at but I want to make sure that I am not reinventing the wheel here and face corner cases. I am wondering if there are any accepted good algorithms (and/or design patterns) that do this efficiently?

UPDATE:
The IR code is a byte-code disassembly from a homebrewn Java-like language and looks like the Jasmine specification.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一片旧的回忆 2024-10-27 12:43:43

从学术角度来看,这里有一些考虑:

  • 你关心保守/正确吗?例如,假设您正在分析的代码包含通过函数指针进行的调用。如果您只是生成文档,则无需处理此问题。如果您正在进行可能出错的代码优化,则需要假设“通过指针调用”意味着“可能是任何东西”。

  • 谨防异常的执行路径。您的 IR 可能会也可能不会将其从您那里抽象出来,但请记住,许多操作可能会引发语言级异常以及硬件中断。同样,这取决于您稍后想要对调用图执行什么操作。

  • 考虑如何处理循环(例如递归、相互递归)。这可能会影响您稍后编写遍历图的代码的方式(即,它们将需要某种“已访问”集来避免永远遍历循环)。

干杯。

3 月 6 日更新

基于添加到原始帖子中的额外信息:

From an academic perspective, here are some considerations:

  • Do you care about being conservative / correct? For example, suppose the code you're analyzing contains a call through a function pointer. If you're just generating documentation, then it's not necessary to deal with this. If you're doing a code optimization that might go wrong, you will need to assume that 'call through pointer' means 'could be anything.'

  • Beware of exceptional execution paths. Your IR may or may not abstract this away from you, but keep in mind that many operations can throw both language-level exceptions as well as hardware interrupts. Again, it depends on what you want to do with the call graph later.

  • Consider how you'll deal with cycles (e.g. recursion, mutual recursion). This may affect how you write code for traversing the graphs later on (i.e., they will need some sort of 'visited' set to avoid traversing cycles forever).

Cheers.

Update March 6:

Based on extra information added to the original post:

  • Be careful about virtual method invocations. Keep in mind that, in general, it is unknowable which method will execute. You may have to assume that the call will go to any of the subclasses of a particular class. The standard example goes a bit like this: suppose you have an ArrayList<A>, and you have class B extends A. Based on a random number generator, you will add instances of A and B to the list. Now you call x.foo() for all x in the list, where foo() is a virtual method in A with an override in B. So, by just looking at the source code, there is no way of knowing whether the loop calls A.foo, B.foo, or both at run time.
╰◇生如夏花灿烂 2024-10-27 12:43:43

我不知道算法,但是 pycallgraph 做得不错。值得查看它的。它并不长,应该有利于检查现有的设计模式。

I don't know the algorithm, but pycallgraph does a decent job. It is worth checking out the source for it. It is not long and should be good for checking out existing design patterns.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文