解析 C++预处理后的源文件

发布于 2024-11-04 17:13:37 字数 1040 浏览 0 评论 0原文

我正在尝试使用我的自定义解析器（用 c++ 编写）来分析 c++ 文件。在开始解析之前，我想删除所有#define。我希望源文件在预处理后可以编译。因此最好的方法是在文件上运行 C 预处理器。

cpp myfile.cpp temp.cpp
// or
g++ -E myfile.cpp > templ.cpp

[欢迎新建议。]

但由于这个原因，原始行及其行号将丢失，因为文件也将包含所有标题信息，我想保留行号。所以我决定的出路是，

在前面添加一个特殊符号源文件中的每一行（预处理器除外）
运行预处理器
提取具有该特殊的行符号并分析它们

例如，一个典型的源文件将如下所示：

#include<iostream>
#include"xyz.h"
int x;    
#define SOME value
/*
**  This is a test file
*/
typedef char* cp;

void myFunc (int* i, ABC<int, X<double> > o)
{
  //...
}

class B {
};

添加符号后，它会像，

#include<iostream>
#include"xyz.h"
@3@int x;    
#define SOME value
@5@/*
@6@**  This is a test file
@7@*/
@8@typedef char* cp;
@9@
@10@void myFunc (int* i, ABC<int, X<double> > o)
@11@{
@12@  //...
@13@}
@14@
@15@class B {
@16@};

一旦删除所有宏和注释，我将留下数千行，其中几百行将是原始源代码。

这种做法正确吗？我是否遗漏了任何极端情况？

原文

I am trying to analyze c++ files using my custom made parser (written in c++). Before start parsing, I will like to get rid of all #define. I want the source file to be compilable after preprocessing. So best way will be to run C Preprocessor on the file.

cpp myfile.cpp temp.cpp
// or
g++ -E myfile.cpp > templ.cpp

[New suggestions are welcome.]

But due to this, the original lines and their line numbers will be lost as the file will contain all the header information also and I want to retain the line numbers. So the way out I have decided is,

Add a special symbol before
every line in the source file (except preprocessors)
Run the preprocessor
Extract the lines with that special
symbol and analyze them

For example, a typical source file will look like:

#include<iostream>
#include"xyz.h"
int x;    
#define SOME value
/*
**  This is a test file
*/
typedef char* cp;

void myFunc (int* i, ABC<int, X<double> > o)
{
  //...
}

class B {
};

After adding symbol it will be like,

#include<iostream>
#include"xyz.h"
@3@int x;    
#define SOME value
@5@/*
@6@**  This is a test file
@7@*/
@8@typedef char* cp;
@9@
@10@void myFunc (int* i, ABC<int, X<double> > o)
@11@{
@12@  //...
@13@}
@14@
@15@class B {
@16@};

Once all the macros and comments are removed, I will be left with thousands of line in which few hundred will be the original source code.

Is this approach correct ? Am I missing any corner case ?

分享到QQ

分享到微博