什么是 AST、CFG、CLANG,我们如何在死代码去除算法中使用它们?
我即将与我们的团队一起使用 C 语言为在线活动编写一个死代码删除算法。
要求是......
- 读取一个C程序源文件,其中有多种形式的死代码。
- 我们的输出应该是一个文件,它没有任何死代码。
在上网时,我们遇到了 SO 链接...
在看到这些链接之前,我们有基本思想... 使用普通文件流逐行读取输入 C 文件并存储在字符串数组中。 然后分析这些字符串并确定非常基本的死代码,例如 if(0) 和 if(1) 等。 并创建一个堆栈,用于维护括号。等等……
但这有一个很大的问题,这个想法将导致我们对字符串操作做更多的事情,而不是删除死代码。
但看到这些链接后... 我们了解到 Clang 库、抽象语法树、控制流图等...
但我们对这些库和概念非常陌生。 我们知道它们是用来解析C代码的。
因此,我们需要一些关于 AST、CFG 的基本想法和一些基本指导,解释我们如何使用 在我们的代码中...
我们可以将 clang 库作为普通库(如 math.h)包含在内吗?
我们在哪里可以下载该库?
我们可以在 Windows 中使用那些 Clang 库吗?
I am about to write a dead-code removal algorithm using C language for an online event with our team.
The requirements are.....
- To read a C program source file,Which has many forms of dead-codes.
- And our output should be a file, Which is free from all dead-codes.
While surfing the internet, we came across the SO links...
How can I know which parts in the code are never used?
Dead code detection in legacy C/C++ project
Before seeing these links,we had the basic idea...
Reading the input C file, line by line using normal file stream and store in an string array.
Then to analyze those strings and determine the very basic dead codes like if(0) and if(1) etc..
And making a stack, for maintaining the parenthesis. And so more...
But this has a great problem, that this idea will lead us to do more with string operations rather than removing dead-codes.
But After seeing these link...
We came to know about
Clang library,Abstract Syntax Tree,Control-Flow-Graph etc...
But we are very newbie to those libraries and those concepts.
We came to know that they are used to parse the C code.
Hence we need some basic ideas about these AST,CFG and some basic guidance, explaining how can we use
that in our code...
Can we include that clang library as a normal library like math.h?
Where can we download that library?
Does we can use those Clang libraries in windows?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我可以向您解释控制流图的概念,但我对库本身并不熟悉。
这个概念很简单。将任何连续的代码行(即没有
if
、goto
或函数调用或标签)想象为图的一个节点。每个 goto 或函数调用都会创建从当前节点到 goto 标签所在节点或其正在调用的函数的定向链接。请记住,函数本身可以是一个图,而不是一个简单的节点,因为它内部可能有 if 或其他函数调用。每个函数调用还会创建从函数的叶节点(函数返回的位置)到包含函数调用后代码的节点的定向链接。 (这可以创建很多从函数图传出的链接,因为可以在代码的许多部分调用该函数)同样,如果您有一个
if
,则您有两个来自当前节点的方向链接到if
语句的if
部分和else
部分(除非您检测到if(0)
或if(1)
就像你说的,在这种情况下只有一个链接到正确的位置)图表的根是
main
的入口点。现在,要找到死代码,您必须做的就是从根位置简单地遍历图形(例如使用 DFS 或 BFS),并最终查看哪些节点未被访问。这向您显示了死代码,即代码中无论您的程序采取什么方向,都不会到达这些位置的位置。如果您想自己实现,可以采用递归方法(类似于解析代码,但更简单)。例如,如果您看到
if
,您可以说:I can explain to you the concept of control flow graphs, but I am not familiar with the library itself.
The concept is simple. Imagine any sequential lines of code (that is without
if
,goto
or function call or labels) as one node of a graph. Everygoto
or function call creates a directional link from the current node to the node where thegoto
label is or the function it is calling. Remember that a function itself could be a graph and not a simple node, because it may haveif
s or other function calls inside. Each function call also creates a directional link from leaf nodes of the function (where the functionreturn
s) to the node containing the codes right after the function call. (That can create a lot of links outgoing from the function graph because the function can be called in many parts of the code)Likewise, if you have an
if
, you have two direction links from the current node to both theif
part and theelse
part of theif
statement (unless you detectif(0)
orif(1)
like you said in which case there is only one link to the proper location)The root of your graph is the entry point of
main
. Now what you must do to find dead code is to simply traverse the graph from the root position (using DFS or BFS for example) and in the end see which nodes were NOT visited. This shows you the dead codes, that is places in the code that no matter what direction your program takes, it won't reach those locations.If you want to implement this yourself, you can take a recursive approach (similar to parsing the code but simpler). For example if you see an
if
you say:您可以查看由 控制和数据流图示例“http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html”rel="nofollow">DMS 软件重新工程工具包。
我们已经使用 DMS 的数据流分析机制及其 C 前端,包括指向分析,如果您确实想在大型 C 系统中找到死函数,那么这是实际必需的。
You can see examples of control and data flow graphs extracted by the DMS Software Reengineering Toolkit.
We have done this on very large applications (26 million lines of C) using DMS's data flow analysis machinery and its C Front End, including a points-to analysis, which is a practical necessity if you really want to find dead functions in a large C system.