源到源的操作
我需要在 Linux 内核中进行一些源到源的操作。我尝试使用 clang 来实现此目的,但出现了问题。 Clang 对源代码进行预处理,即宏和包含扩展。这会导致 clang 有时会在 Linux 内核方面产生损坏的 C 代码。我无法手动维护所有更改,因为我预计每个文件都有数千个更改。
我尝试了ANTLR,但可用的公共语法不完整,不适合Linux内核等项目。
所以我的问题如下。有没有什么方法可以在不进行预处理的情况下对 C 代码执行源到源操作?
因此假设以下代码。
#define AAA 1
void f1(int a){
if(a == AAA)
printf("hello");
}
在应用源到源操作后,我想要得到这个
#define AAA 1
void f1(int a){
if(functionCall(a == AAA))
printf("hello");
}
,但是例如,Clang 生成的以下代码不符合我的要求,即它扩展了宏 AAA
#define AAA 1
void f1(int a){
if(functionCall(a == 1))
printf("hello");
}
我希望我足够清楚。
编辑
上面的代码只是一个示例。我想要做的源到源操作不限于 if() 语句替换,还可以在表达式前面插入一元运算符,用其正值或负值替换算术表达式等。
解决方案
我为自己找到了一个解决方案。我使用 gcc 来生成预处理的源代码,然后应用 Clang。然后我对宏扩展和包含没有任何问题,因为这项工作是由 gcc 完成的。感谢您的回答!
I need to do some source-to-source manipulations in Linux kernel. I tried to use clang for this purpose but there is a problem. Clang does preprocessing of the source code, i.e. macro and include expansion. This causes clang to sometimes produce broken C code in terms of Linux kernel. I can't maintain all the changes manually, since I expect to have thousands of changes per single file.
I tried ANTLR, but the public grammars available are incomplete and not suitable for such projects as Linux kernel.
So my question is the following. Are there any ways to perform source-to-source manipulations for a C code without preprocessing it?
So assume following code.
#define AAA 1
void f1(int a){
if(a == AAA)
printf("hello");
}
After applying source-to-source manipulation I want to get this
#define AAA 1
void f1(int a){
if(functionCall(a == AAA))
printf("hello");
}
But Clang, for instance, produces following code which does not fit my requirements, i.e. it expands macro AAA
#define AAA 1
void f1(int a){
if(functionCall(a == 1))
printf("hello");
}
I hope I was clear enough.
Edit
The above code is only an example. The source-to-source manipulations I want to do are not restricted with if()
statement substitution, but also inserting unary operator in front of expression, replace arithmetic expression with its positive or negative value, etc.
Solution
There is one solution I found for my self. I use gcc in order to produce preprocessed source code and then apply Clang. Then I don't have any issues with macro expansion and includes, since that job is done by gcc. Thanks for the answers!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
一个想法是将所有出现的 替换
为
您可以使用 sed 工具轻松完成此操作。
如果您有有限的要替换的模式集合,您可以编写 sed 脚本来执行替换。
这能解决您的问题吗?
An idea would be to replace all occurrences of
with
You can do this easily using, e.g., the sed tool.
If you have a finite collection of patterns to be replaced you can write a sed script to perform the substitution.
Would this solve your problem?
处理预处理器是将转换应用于 C(和 C++)代码时最困难的问题之一。
我们的 DMS 软件重组工具包及其 C 前端 相对接近于做到这一点。 DMS 可以解析 C 源代码,保留大多数预处理器条件、宏定义和使用。
它通过允许预处理器在“结构良好”的地方执行操作来实现这一点。示例:#defines 可以出现在声明或语句可能出现的地方,宏调用和条件语句可以替代语言中的许多非终结符(例如,函数头、表达式、语句、声明)以及人们通常放置的许多非结构化位置它们(例如,#if fooif (...) {#endif)。它解析源代码和预处理器指令,就好像它们是一种语言的一部分(它们是,称为“C”),并构建相应的 AST,这些 AST 可以进行转换,并将使用捕获的预处理器指令正确重新生成。 [这个级别的能力完美地处理了OP的例子。]
一些指令放置得不好(无论是在语法意义上,例如,跨语言的多个片段,还是在“你一定是在开玩笑”的可理解性意义上)。这些 DMS 通过在高级工程师的指导下扩展它们来进行处理(“始终扩展此宏”)。一种不太令人满意的方法是将非结构化预处理器条件/宏调用手动转换为结构化条件;这有点痛苦,但比人们想象的更可行,因为坏情况发生的频率比好情况少得多。
为了做得更好,需要有考虑预处理器条件的符号表和流分析,并捕获所有预处理器条件。我们已经使用 DMS 进行了一些实验工作,以捕获符号表中的条件声明(似乎工作正常),并且我们刚刚开始为后者制定方案。
做到绿色并不容易。
Handling the preprocessor is one of the most difficult problems in applying transformations to C (and C++) code.
Our DMS Software Reengineering Toolkit with its C Front End come relatively close to doing this. DMS can parse C source code, preserving most preprocessor conditionals, macro defintions and uses.
It does so by allow preprocessor actions in "well-structured" places. Examples: #defines are allowed where declarations or statements can occur, macro calls and conditionals as replacements for many of the nonterminals in the language (e.g., function head, expression, statement, declarations) and in many non-structured places that people commonly place them (e.g, #if fooif (...) {#endif). It parses the source code and preprocessor directives as if they were part of one language (they ARE, its called "C"), and builds corresponding ASTs, which can be transformed and will regenerate correctly with the captured preprocessor directives. [This level of capability handles OP's example perfectly.]
Some directives are poorly placed (both in the syntax sense, e.g., across multiple fragments of the language, and the "you've got to be kidding" understandability sense). These DMS handles by expanding them away, with some guidance from the advance engineer ("alway expand this macro"). A less satisfactory approach is to hand-convert the unstructured preprocessor conditionals/macro calls into structured ones; this is a bit painful but more workable than one might expect since the bad cases occur with considerably less frequency than the good ones.
To do better than this, one needs to have symbol tables and flow analysis that take into account the preprocessor conditions, and capture all the preprocessor conditionals. We've done some experimental work with DMS to capture conditional declarations in the symbol table (seems to work fine), and we're just starting work on a scheme for the latter.
Not easy being green.
Clang 维护着有关原始源代码的极其准确的信息。
最值得注意的是,SourceManager 能够判断给定的标记是从宏扩展还是按原样编写,Chandler Caruth 最近实现了宏诊断,能够显示实际的宏扩展堆栈(位于扩展的各个阶段)追溯到实际编写的代码(3.0)。
因此,可以使用生成的 AST,然后重写源代码,并保留其所有宏。您必须查询几乎每个节点才能知道它是否来自宏扩展,以及它是否确实检索扩展的原始代码,但这似乎仍然是可能的。
,所以我想您应该拥有所需的一切:)(希望如此,因为我无法提供更多帮助:p)
Clang maintains extremely accurate information about the original source code.
Most notably, the
SourceManager
is able to tell if a given token has been expanded from a macro or written as is, and Chandler Caruth recently implemented macro diagnosis which are able to display the actual macro expansion stack (at the various stages of expansions) tracing back to the actual written code (3.0).Therefore, it is possible to use the generated AST and then rewrite the source code with all its macros still in place. You would have to query virtually every node to know whether it comes from a macro expansion or not, and if it does retrieve the original code of the expansion, but still it seems possible.
So I guess you should have all you need :) (And hope so because I won't be able to help much more :p)
我建议使用 Rose 框架。 源代码可在 github 上找到。
I would advise to resort to Rose framework. Source is available on github.
您可以考虑 http://coccinelle.lip6.fr/ :它提供了一个很好的语义修补框架。
You may consider http://coccinelle.lip6.fr/ : it provides a nice semantics patching framwork.