解析 C++预处理后的源文件
我正在尝试使用我的自定义解析器(用 c++
编写)来分析 c++
文件。在开始解析之前,我想删除所有#define
。我希望源文件在预处理后可以编译。因此最好的方法是在文件上运行 C 预处理器。
cpp myfile.cpp temp.cpp
// or
g++ -E myfile.cpp > templ.cpp
[欢迎新建议。]
但由于这个原因,原始行及其行号将丢失,因为文件也将包含所有标题信息,我想保留行号。所以我决定的出路是,
- 在前面添加一个特殊符号 源文件中的每一行(预处理器除外)
- 运行预处理器
- 提取具有该特殊的行 符号并分析它们
例如,一个典型的源文件将如下所示:
#include<iostream>
#include"xyz.h"
int x;
#define SOME value
/*
** This is a test file
*/
typedef char* cp;
void myFunc (int* i, ABC<int, X<double> > o)
{
//...
}
class B {
};
添加符号后,它会像,
#include<iostream>
#include"xyz.h"
@3@int x;
#define SOME value
@5@/*
@6@** This is a test file
@7@*/
@8@typedef char* cp;
@9@
@10@void myFunc (int* i, ABC<int, X<double> > o)
@11@{
@12@ //...
@13@}
@14@
@15@class B {
@16@};
一旦删除所有宏和注释,我将留下数千行,其中几百行将是原始源代码。
这种做法正确吗?我是否遗漏了任何极端情况?
I am trying to analyze c++
files using my custom made parser (written in c++
). Before start parsing, I will like to get rid of all #define
. I want the source file to be compilable after preprocessing. So best way will be to run C Preprocessor
on the file.
cpp myfile.cpp temp.cpp
// or
g++ -E myfile.cpp > templ.cpp
[New suggestions are welcome.]
But due to this, the original lines and their line numbers will be lost as the file will contain all the header information also and I want to retain the line numbers. So the way out I have decided is,
- Add a special symbol before
every line in the source file (except preprocessors) - Run the preprocessor
- Extract the lines with that special
symbol and analyze them
For example, a typical source file will look like:
#include<iostream>
#include"xyz.h"
int x;
#define SOME value
/*
** This is a test file
*/
typedef char* cp;
void myFunc (int* i, ABC<int, X<double> > o)
{
//...
}
class B {
};
After adding symbol it will be like,
#include<iostream>
#include"xyz.h"
@3@int x;
#define SOME value
@5@/*
@6@** This is a test file
@7@*/
@8@typedef char* cp;
@9@
@10@void myFunc (int* i, ABC<int, X<double> > o)
@11@{
@12@ //...
@13@}
@14@
@15@class B {
@16@};
Once all the macros and comments are removed, I will be left with thousands of line in which few hundred will be the original source code.
Is this approach correct ? Am I missing any corner case ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您是否意识到 g++ -E 在其输出中添加了一些自己的行,这些行指示原始文件中的行号?您会发现类似的行
表明您正在查看文件 foo.cc 的第 2 行。每当正常的行序列被打乱时,就会插入这些行。
You realize that g++ -E adds some of its own lines to its output which indicate line numbers in the original file? You'll find lines like
which indicate that you're looking at line 2 of file foo.cc . These lines are inserted whenever the regular sequence of lines is disrupted.
过去与 X11 源一起提供的
imake
程序使用了一个稍微相似的系统,用@@
标记行尾,以便它可以正确地进行后处理。gcc -E
的输出通常包含#line
指令;您也许可以使用它们来代替您的符号。The
imake
program that used to come with X11 sources used a faintly similar system, marking the ends of lines with@@
so that it could post-process them properly.The output from
gcc -E
usually includes#line
directives; you could perhaps use those instead of your symbols.