生成调用图的好算法?
我正在编写一些代码来生成特定中间表示的调用图,无需通过静态扫描 IR 代码来执行它。 IR 代码本身并不太复杂,而且我对函数调用序列的样子有很好的了解,因此我需要做的就是跟踪调用。我目前正在以明显的方式这样做:
- 跟踪我们在哪里
- 如果我们遇到函数调用,则分支到该位置,执行并返回
- 当分支在调用者和被调用者之间放置一条边时,
我对我得到的结果感到满意但我想确保我不会在这里重新发明轮子并面临极端情况。我想知道是否有任何公认的好的算法(和/或设计模式)可以有效地做到这一点?
更新: IR 代码是来自自制类 Java 语言的字节码反汇编,看起来像 Jasmine 规范。
I am writing some code to generate call graphs for a particular intermediate representation without executing it by statically scanning the IR code. The IR code itself is not too complex and I have a good understanding of what function call sequences look like so all I need to do is trace the calls. I am currently doing it the obvious way:
- Keep track of where we are
- If we encounter a function call, branch to that location, execute and come back
- While branching put an edge between the caller and the callee
I am satisfied with where I am getting at but I want to make sure that I am not reinventing the wheel here and face corner cases. I am wondering if there are any accepted good algorithms (and/or design patterns) that do this efficiently?
UPDATE:
The IR code is a byte-code disassembly from a homebrewn Java-like language and looks like the Jasmine specification.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
从学术角度来看,这里有一些考虑:
你关心保守/正确吗?例如,假设您正在分析的代码包含通过函数指针进行的调用。如果您只是生成文档,则无需处理此问题。如果您正在进行可能出错的代码优化,则需要假设“通过指针调用”意味着“可能是任何东西”。
谨防异常的执行路径。您的 IR 可能会也可能不会将其从您那里抽象出来,但请记住,许多操作可能会引发语言级异常以及硬件中断。同样,这取决于您稍后想要对调用图执行什么操作。
考虑如何处理循环(例如递归、相互递归)。这可能会影响您稍后编写遍历图的代码的方式(即,它们将需要某种“已访问”集来避免永远遍历循环)。
干杯。
3 月 6 日更新:
基于添加到原始帖子中的额外信息:
class B extends A
。基于随机数生成器,您将A
和B
的实例添加到列表中。现在,您为列表中的所有x
调用x.foo()
,其中foo()
是A< 中的虚拟方法/code> 并在
B
中进行覆盖。因此,仅通过查看源代码,无法知道循环在运行时是否调用A.foo
、B.foo
或两者。From an academic perspective, here are some considerations:
Do you care about being conservative / correct? For example, suppose the code you're analyzing contains a call through a function pointer. If you're just generating documentation, then it's not necessary to deal with this. If you're doing a code optimization that might go wrong, you will need to assume that 'call through pointer' means 'could be anything.'
Beware of exceptional execution paths. Your IR may or may not abstract this away from you, but keep in mind that many operations can throw both language-level exceptions as well as hardware interrupts. Again, it depends on what you want to do with the call graph later.
Consider how you'll deal with cycles (e.g. recursion, mutual recursion). This may affect how you write code for traversing the graphs later on (i.e., they will need some sort of 'visited' set to avoid traversing cycles forever).
Cheers.
Update March 6:
Based on extra information added to the original post:
ArrayList<A>
, and you haveclass B extends A
. Based on a random number generator, you will add instances ofA
andB
to the list. Now you callx.foo()
for allx
in the list, wherefoo()
is a virtual method inA
with an override inB
. So, by just looking at the source code, there is no way of knowing whether the loop callsA.foo
,B.foo
, or both at run time.我不知道算法,但是 pycallgraph 做得不错。值得查看它的源。它并不长,应该有利于检查现有的设计模式。
I don't know the algorithm, but pycallgraph does a decent job. It is worth checking out the source for it. It is not long and should be good for checking out existing design patterns.