我的更简单的死代码删除器
我正在以一种非常简单的方式刺激死代码去除器。
为此,我的想法是,
第 1 步:逐行读取输入的 C 程序并将其存储在双向链表或数组中。(因为删除和插入比文件操作更容易)。
疑问:我的做法正确吗?如果是这样,如何最小化每次遍历链表的次数。
第 2 步: 读取字符串的分析将并行完成,并创建表来维护变量名称及其详细信息、函数及其调用等。
第3步:将对变量表中的每个条目进行搜索,并将变量替换为其当时的值(原样)。 (例如)
i=0;
if(i==3) will be replaced by if(0==3).
但是在这样的情况下
get(a);
i=a;
if(i){}
,“i”将不会被替换,因为它取决于另一个变量。 'a' 不会被替换,因为它取决于用户输入。
疑问:如果用户输入是, if(5*5+6){打印你好;} , 这肯定是不必要的检查。我如何解决这个表达式以将代码简化为 { 打印你好; }
第 4 步: 将搜索字符串 if(0),while(0) 等,并使用堆栈删除操作块。 if(0){//这将被删除*/}
第 5 步:(例如) function foo(){/**/} ... if(0) foo(); ...,一旦删除了所有无效代码,就会检查函数表中的 foo() 条目,以获取代码中引用它的次数。如果它是 0,则必须使用相同的堆栈方法删除该函数。
第 6 步:在其余函数中,return 语句(如果有)下方的行将被删除,“}”除外。此删除操作会一直进行到函数结束为止。使用堆栈来标识函数的结束。
第 7 步: 我假设我的无死角代码现在已经准备好了。将链表或数组存储在输出文件中。
我的问题是.. 1.我的想法是否有意义?或者它可以实施吗?如何 我可以改进这个算法吗?
2.当我尝试实现这个想法时,我必须更多地处理字符串 操作而不是删除死代码。有什么方法可以减少 该算法中的字符串操作。
I am doing a stimulation of dead-code remover in a very simpler manner.
For that my Idea is to,
Step 1: Read the input C-Program line by line and store it in a doubly linked-list or Array.(Since deletion and insertion will be easier than in file operations).
Doubt:Is my approach correct? If so, How to minimize traversing a Linked-List each time.
Step 2: Analyzing of the read strings will be done in parallel, and tables are created to maintain variables names and their details, functions and their calls,etc.,
Step 3: Searching will be done for each entries in the variable table, and the variables will be replaced by its that time's value(as it has).
(E.g.)
i=0;
if(i==3) will be replaced by if(0==3).
But on situation like..
get(a);
i=a;
if(i){}
here,'i' will not be replaced since it depends on another variable. 'a' will not be replaced since it depends on user input.
Doubt: if user input is,
if(5*5+6){print hello;} ,
it surely will be unnecessary check. How can i solve this expression to simplify the code as
{
print hello;
}
Step 4: Strings will be searched for if(0),while(0) etc., and using stack, the action block is removed. if(0){//this will be removed*/}
Step 5:(E.g) function foo(){/**/} ... if(0) foo(); ..., Once all the dead codes are removed, foo()'s entry in the function table is checked to get no.of.times it gets referred in the code. If it is 0, that function has to be removed using the same stack method.
Step 6: In the remaining functions, the lines below the return statements (if any) are removed except the '}'. This removal is done till the end of the function. The end of the function is identified using stack.
Step 7: And I will assume that my dead-free code is ready now. Store the linked-list or array in an output file.
My Questions are..
1.Whether my idea will be meaningful? or will it be implementable? How
can I improve this algorithm?2.While i am trying to implement this idea, I have to deal more with string
manipulations rather than removing dead-codes. Is any way to reduce
string manipulations in this algorithm.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要这样做。 C 是一种自由形式的语言,尝试逐行处理它会导致支持 C 的一个子集,而该子集受到如此可笑的限制,以至于它不支持 C 语言。当之无愧的名字。
您需要做的是编写一个合适的解析器。有大量关于这方面的文献。找出您的学校在其编译器构建课程中使用哪本教科书,然后完成该课程 - 或者直接参加该课程!只有当您掌握了解析器后,您才应该开始考虑语义。然后在抽象语法树而不是字符串上进行工作。或者,找到一个已经编写并测试过的 C 解析器,您可以重用它(但您仍然需要学习相当多的知识才能将其与您自己的处理集成)。
如果您最终自己编写了解析器,并且这只是为了您自己的启发,请考虑使用比 C 更简单的语言作为您的主题。尽管作为语言的核心,C 语言相当紧凑,但正确获取声明语法的所有细节却出人意料地棘手,并且可能会分散您对真正感兴趣的内容的注意力。预处理器的存在本身就是一个问题这使得设计有意义的源到源转换变得非常困难。
顺便说一句,您绘制的转换在行业中被称为“常量传播”,或者(在更雄心勃勃的变体中,当函数和循环体具有不同的常量输入时,它们将克隆函数和循环体)“部分评估”。谷歌搜索这些术语可能会很有趣。
Do not do it this way. C is a free-form language, and trying to process it line-by-line will result in supporting a subset of C that is so ridiculously restricted that it doesn't deserve the name.
What you need to do is to write a proper parser. There is copious literature about that out there. Find out which textbook your school uses for its compiler-construction course, and work through that -- or just take the course! Only when you've got the parser down should you even begin to consider semantics. Then do your work on abstract syntax trees instead of strings. Alternatively, find an already written and tested parser for C that you can reuse (but you'll still need to learn quite a bit in order to integrate it with your own processing).
If you end up writing the parser yourself, and it's only for your own edification, consider using a simpler language than C as your subject. Even though C at is core is fairly compact as languages go, getting all details of the declaration syntax right is surprisingly tricky, and will probably detract you from what you're actually interested in. And the presence of the preprocessor is an issue in itself which can make it very difficult to design meaningful source-to-source transformations.
By the way, the transformations you sketch are known in the trade as "constant propagation", or (in a more ambitious variants that will clone functions and loop bodies when they have differing constant inputs) "partial evaluation". Googling those terms may be interesting.