Parsing C++ is extremely hard because the grammar is undecidable. To quote Yossi Kreinin:
Outstandingly complicated grammar
"Outstandingly" should be interpreted literally, because all popular languages have context-free (or "nearly" context-free) grammars, while C++ has undecidable grammar. If you like compilers and parsers, you probably know what this means. If you're not into this kind of thing, there's a simple example showing the problem with parsing C++: is AA BB(CC); an object definition or a function declaration? It turns out that the answer depends heavily on the code before the statement - the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.
The ANTLR parser generator has a grammar for C/C++ as well as the preprocessor. I've never used it so I can't say how complete its parsing of C++ is going to be. ANTLR itself has been a useful tool for me on a couple of occasions for parsing much simpler languages.
Depending on your problem GCCXML might be your answer.
Basically it parses the source using GCC and then gives you easily digestible XML of parse tree.
With GCCXML you are done once and for all.
pycparser is a complete parser for C (C99) written in Python. It has a fully configurable AST backend, so it's being used as a basis for any kind of language processing you might need.
Doesn't support C++, though. Granted, it's much harder than C.
Update (2012): at this time the answer, without any doubt, would be Clang - it's modular, supports the full C++ (with many C++-11 features) and has a relatively friendly code base. It also has a C API for bindings to high-level languages (i.e. for Python).
Have a look at how doxygen works, full source code is available and it's flex-based.
A misleading candidate is GOLD which is a free Windows-based parser toolkit explicitly for creating translators. Their list of supported languages refers to the languages in which one can implement parsers, not the list of supported parse grammars.
Fully and properly parsing ISO C++ is far from trivial, and there were in fact many related efforts. But it is an inherently complex job that isn't easily accomplished, without rewriting a full compiler frontend understanding all of C++ and the preprocessor. A pre-processor implementation called "wave" is available from the Spirit folks.
That said, you might want to have a look at pork/oink (elsa-based), which is a C++ parser toolkit specifically meant to be used for source code transformation purposes, it is being used by the Mozilla project to do large-scale static source code analysis and automated code rewriting, the most interesting part is that it not only supports most of C++, but also the preprocessor itself!
On the other hand there's indeed one single proprietary solution available: the EDG frontend, which can be used for pretty much all C++ related efforts.
Personally, I would check out the elsa-based pork/oink suite which is used at Mozilla, apart from that, the FSF has now approved work on gcc plugins using the runtime library license, thus I'd assume that things are going to change rapidly, once people can easily leverage the gcc-based C++ parser for such purposes using binary plugins.
So, in a nutshell: if you the bucks: EDG, if you need something free/open source now: else/oink are fairly promising, if you have some time, you might want to use gcc for your project.
对于我想象你可能会做的事情,我会考虑破解 Gnu CC 或 Splint 。 特别是 Gnu CC 确实相当彻底地分离了语言生成部分,因此您最好构建一个新的 g++ 后端。
The grammar for C++ is sort of notoriously hairy. There's a good thread at Lambda about it, but the gist is that C++ grammar can require arbitrarily much lookahead.
For the kind of thing I imagine you might be doing, I'd think about hacking either Gnu CC, or Splint. Gnu CC in particular does separate out the language generation part pretty thoroughly, so you might be best off building a new g++ backend.
Actually, PUMA and AspectC++ are still both actively maintained and updated. I was looking into using AspectC++ and was wondering about the lack of updates myself. I e-mailed the author who said that both AspectC++ and PUMA are still being developed. You can get to source code through SVN https://svn.aspectc.org/repos/ or you can get regular binary builds at http://akut.aspectc.org. As with a lot of excellent c++ projects these days, the author doesn't have time to keep up with web page maintenance. Makes sense if you've got a full time job and a life.
Elsa 击败了我所知道的所有其他 C++ 解析,即使它不是 100% 合规。 我是一个粉丝。 有一个模块可以打印 C++,因此这可能是您的玩具项目的一个很好的起点。
Elsa beats everything else I know hands down for C++ parsing, even though it is not 100% compliant. I'm a fan. There's a module that prints out C++, so that may be a good starting point for your toy project.
See our C++ Front End
for a full-featured C++ parser: builds ASTs, symbol tables, does name
and type resolution. You can even parse and retain the preprocessor
directives. The C++ front end is built on top of our DMS Software Reengineering
Toolkit, which allows you to use that information to carry out arbitrary
source code changes using source-to-source transformations.
DMS is the ideal engine for implementing such a translator.
Having said that, I don't see much point in your imagined task; I don't
see much value in trying to replace C++, and you'll find building
a complete translator an enormous amount of work, especially if your
target is a "toy" language. And there is likely little point in
parsing C++ using a robust parser, if its only purpose is to produce
an isomorphic version of C++ that is easier to parse (wait, we postulated
a robust C++ already!).
EDIT May 2012: DMS's C++ front end now handles GCC3/GCC4/C++11,Microsoft VisualC 2005/2010. Robustly.
EDIT Feb 2015: Now handles C++14 in GCC and MS dialects.
EDIT August 2015: Now parses and captures both the code and the preprocessor directives in a unified tree.
EDIT May 2020: Has been doing C++17 for the past few years. C++20 in process.
我使用 Metre 作为 C 解析器的基础。 它是开源的并使用 lex 和 yacc。 这使得无需完全理解 lex 和 lex 即可在短时间内轻松启动并运行。 yacc。
自从 lex & 以来,我还编写了一个 C 应用程序。 yacc 解决方案无法帮助我跨函数跟踪功能并一次性解析整个函数的结构。 它在短时间内变得无法维护并被废弃。
A while back I attempted to write a tool that will automatically generate unit tests for c files.
For preprosessing I put the files thru GCC. The output is ugly but you can easily trace where in the original code from the preprocessed file. But for your needs you might need somthing else.
I used Metre as the base for a C parser. It is open source and uses lex and yacc. This made it easy to get up and running in a short time without fully understanding lex & yacc.
I also wrote a C app since the lex & yacc solution could not help me trace functionality across functions and parse the structure of the entire function in one pass. It became unmaintainable in a short time and was abandoned.
使用 GNU 的 CFlow 这样的工具怎么样,它可以分析代码并生成调用图-graphs,这是 opengroup(手册页)所说的内容关于cflow。 GNU 版本的 cflow 带有源代码,并且也是开源的...
希望这会有所帮助,
此致,
汤姆.
What about using a tool like GNU's CFlow, that can analyse the code and produce charts of call-graphs, here's what the opengroup(man page) has to say about cflow. The GNU version of cflow comes with source, and open source also ...
发布评论
评论(14)
解析 C++ 非常困难,因为语法是不可判定的。 引用Yossi Kreinin:
Parsing C++ is extremely hard because the grammar is undecidable. To quote Yossi Kreinin:
你可以看看使用llvm进行解析的clang。
立即完全支持 C++ 链接
You can look at clang that uses llvm for parsing.
Support C++ fully now link
ANTLR 解析器生成器有一个 C/C++ 语法以及预处理器。 我从未使用过它,所以我不能说它对 C++ 的解析有多完整。 在某些情况下,ANTLR 本身对我来说是一个有用的工具,可以解析更简单的语言。
The ANTLR parser generator has a grammar for C/C++ as well as the preprocessor. I've never used it so I can't say how complete its parsing of C++ is going to be. ANTLR itself has been a useful tool for me on a couple of occasions for parsing much simpler languages.
根据您的问题 GCCXML 可能是您的答案。
基本上,它使用 GCC 解析源代码,然后为您提供易于理解的解析树 XML。
有了 GCCXML,您就一劳永逸了。
Depending on your problem GCCXML might be your answer.
Basically it parses the source using GCC and then gives you easily digestible XML of parse tree.
With GCCXML you are done once and for all.
pycparser 是一个用 Python 编写的完整的 C (C99) 解析器。 它有一个完全可配置的 AST 后端,因此它被用作您可能需要的任何类型语言处理的基础。
但不支持 C++。 诚然,它比 C 难得多。
更新 (2012):毫无疑问,此时的答案将是 Clang - 它是模块化的,支持完整的 C++(具有许多 C++-11 功能)并且具有相对友好的代码库。 它还具有用于绑定到高级语言的 C API(即 对于 Python)。
pycparser is a complete parser for C (C99) written in Python. It has a fully configurable AST backend, so it's being used as a basis for any kind of language processing you might need.
Doesn't support C++, though. Granted, it's much harder than C.
Update (2012): at this time the answer, without any doubt, would be Clang - it's modular, supports the full C++ (with many C++-11 features) and has a relatively friendly code base. It also has a C API for bindings to high-level languages (i.e. for Python).
看看 doxygen 是如何工作的,完整的源代码可用并且它是基于 flex 的。
一个具有误导性的候选者是GOLD,它是一个免费的基于Windows的解析器工具包,明确用于创建翻译器。 他们支持的语言列表指的是可以实现解析器的语言,而不是支持的解析语法列表。
他们只有 C 和 C# 语法,没有 C++。
Have a look at how doxygen works, full source code is available and it's flex-based.
A misleading candidate is GOLD which is a free Windows-based parser toolkit explicitly for creating translators. Their list of supported languages refers to the languages in which one can implement parsers, not the list of supported parse grammars.
They only have grammars for C and C#, no C++.
解析 C++ 是一项非常复杂的挑战。
有 Boost/Spirit 框架,几年前他们做了 尝试实现 C++ 解析器,但它是 远未完成。
完整且正确地解析 ISO C++ 绝非易事,事实上有很多相关的工作。 但这是一项本质上很复杂的工作,如果不重写理解所有 C++ 和预处理器的完整编译器前端,就不容易完成。 Spirit 人员提供了一种称为“wave”的预处理器实现。
也就是说,您可能想看看 pork/oink(基于 elsa),这是一个 C++ 解析器工具包,专门用于源代码转换目的,Mozilla 项目正在使用它来完成大型任务-规模静态源代码分析和自动代码重写,最有趣的部分是它不仅支持大部分C++,还支持预处理器本身!
另一方面,确实有一个可用的专有解决方案:EDG 前端,它几乎可用于所有与 C++ 相关的工作。
就我个人而言,我会查看 Mozilla 使用的基于 elsa 的 Pork/oink 套件,除此之外,FSF 现在已经批准了 gcc 插件 使用运行时库许可证,因此我认为一旦人们可以使用二进制插件轻松地利用基于 gcc 的 C++ 解析器来实现此类目的,事情就会迅速改变。
所以,简而言之:如果你有钱:EDG,如果你现在需要一些免费/开源的东西:else/oink 相当有前途,如果你有时间,你可能想使用 gcc你的项目。
仅适用于 C 代码的另一个选项是 cscout。
Parsing C++ is a very complex challenge.
There's the Boost/Spirit framework, and a couple of years ago they did play with the idea of implementing a C++ parser, but it's far from complete.
Fully and properly parsing ISO C++ is far from trivial, and there were in fact many related efforts. But it is an inherently complex job that isn't easily accomplished, without rewriting a full compiler frontend understanding all of C++ and the preprocessor. A pre-processor implementation called "wave" is available from the Spirit folks.
That said, you might want to have a look at pork/oink (elsa-based), which is a C++ parser toolkit specifically meant to be used for source code transformation purposes, it is being used by the Mozilla project to do large-scale static source code analysis and automated code rewriting, the most interesting part is that it not only supports most of C++, but also the preprocessor itself!
On the other hand there's indeed one single proprietary solution available: the EDG frontend, which can be used for pretty much all C++ related efforts.
Personally, I would check out the elsa-based pork/oink suite which is used at Mozilla, apart from that, the FSF has now approved work on gcc plugins using the runtime library license, thus I'd assume that things are going to change rapidly, once people can easily leverage the gcc-based C++ parser for such purposes using binary plugins.
So, in a nutshell: if you the bucks: EDG, if you need something free/open source now: else/oink are fairly promising, if you have some time, you might want to use gcc for your project.
Another option just for C code is cscout.
C++ 的语法是出了名的复杂。 Lambda 上有一个关于它的好帖子,但要点是 C++ 语法可以任意要求多向前看。
对于我想象你可能会做的事情,我会考虑破解 Gnu CC 或 Splint 。 特别是 Gnu CC 确实相当彻底地分离了语言生成部分,因此您最好构建一个新的 g++ 后端。
The grammar for C++ is sort of notoriously hairy. There's a good thread at Lambda about it, but the gist is that C++ grammar can require arbitrarily much lookahead.
For the kind of thing I imagine you might be doing, I'd think about hacking either Gnu CC, or Splint. Gnu CC in particular does separate out the language generation part pretty thoroughly, so you might be best off building a new g++ backend.
实际上,PUMA 和 AspectC++ 仍然在积极维护和更新。 我正在考虑使用 AspectC++,并且想知道自己是否缺乏更新。 我给作者发了邮件,他说AspectC++和PUMA都还在开发中。 您可以通过 SVN https://svn.aspectc.org/repos/ 获取源代码或您可以在 http://akut.aspectc.org 获取常规二进制版本。 与当今许多优秀的 C++ 项目一样,作者没有时间跟上网页维护。 如果您有一份全职工作和生活,这是有道理的。
Actually, PUMA and AspectC++ are still both actively maintained and updated. I was looking into using AspectC++ and was wondering about the lack of updates myself. I e-mailed the author who said that both AspectC++ and PUMA are still being developed. You can get to source code through SVN https://svn.aspectc.org/repos/ or you can get regular binary builds at http://akut.aspectc.org. As with a lot of excellent c++ projects these days, the author doesn't have time to keep up with web page maintenance. Makes sense if you've got a full time job and a life.
一些更容易理解的东西怎么样,比如 tiny-C 或 小 C
how about something easier to comprehend like tiny-C or Small C
Elsa 击败了我所知道的所有其他 C++ 解析,即使它不是 100% 合规。 我是一个粉丝。 有一个模块可以打印 C++,因此这可能是您的玩具项目的一个很好的起点。
Elsa beats everything else I know hands down for C++ parsing, even though it is not 100% compliant. I'm a fan. There's a module that prints out C++, so that may be a good starting point for your toy project.
请参阅我们的 C++ 前端
对于功能齐全的 C++ 解析器:构建 AST、符号表、命名
和类型分辨率。 您甚至可以解析并保留预处理器
指令。 C++ 前端构建在我们的 DMS 软件重组之上
工具包,它允许您使用该信息执行任意操作
使用源到源的转换来更改源代码。
DMS 是实现此类转换器的理想引擎。
话虽如此,我认为你想象的任务没有什么意义; 我不
看到尝试取代 C++ 的巨大价值,并且您会发现构建
一个完整的翻译需要大量的工作,特别是如果您
目标是一种“玩具”语言。 并且可能没有什么意义
使用强大的解析器解析 C++,如果它的唯一目的是生成
更容易解析的 C++ 的同构版本(等等,我们假设
已经是一个强大的 C++ 了!)。
2012 年 5 月编辑:DMS 的 C++ 前端现在可以处理 GCC3/GCC4/C++11、Microsoft VisualC 2005/2010。 稳健。
2015 年 2 月编辑:现在可以处理 GCC 和 MS 方言中的 C++14。
2015 年 8 月编辑:现在解析并捕获统一树中的代码和预处理器指令。
2020 年 5 月编辑:过去几年一直在研究 C++17。 C++20 正在进行中。
See our C++ Front End
for a full-featured C++ parser: builds ASTs, symbol tables, does name
and type resolution. You can even parse and retain the preprocessor
directives. The C++ front end is built on top of our DMS Software Reengineering
Toolkit, which allows you to use that information to carry out arbitrary
source code changes using source-to-source transformations.
DMS is the ideal engine for implementing such a translator.
Having said that, I don't see much point in your imagined task; I don't
see much value in trying to replace C++, and you'll find building
a complete translator an enormous amount of work, especially if your
target is a "toy" language. And there is likely little point in
parsing C++ using a robust parser, if its only purpose is to produce
an isomorphic version of C++ that is easier to parse (wait, we postulated
a robust C++ already!).
EDIT May 2012: DMS's C++ front end now handles GCC3/GCC4/C++11,Microsoft VisualC 2005/2010. Robustly.
EDIT Feb 2015: Now handles C++14 in GCC and MS dialects.
EDIT August 2015: Now parses and captures both the code and the preprocessor directives in a unified tree.
EDIT May 2020: Has been doing C++17 for the past few years. C++20 in process.
不久前,我尝试编写一个工具来自动为 c 文件生成单元测试。
为了进行预处理,我将文件放入 GCC 中。 输出很丑陋,但您可以轻松地从预处理文件中跟踪原始代码的位置。 但为了满足您的需求,您可能还需要其他东西。
我使用 Metre 作为 C 解析器的基础。 它是开源的并使用 lex 和 yacc。 这使得无需完全理解 lex 和 lex 即可在短时间内轻松启动并运行。 yacc。
自从 lex & 以来,我还编写了一个 C 应用程序。 yacc 解决方案无法帮助我跨函数跟踪功能并一次性解析整个函数的结构。 它在短时间内变得无法维护并被废弃。
A while back I attempted to write a tool that will automatically generate unit tests for c files.
For preprosessing I put the files thru GCC. The output is ugly but you can easily trace where in the original code from the preprocessed file. But for your needs you might need somthing else.
I used Metre as the base for a C parser. It is open source and uses lex and yacc. This made it easy to get up and running in a short time without fully understanding lex & yacc.
I also wrote a C app since the lex & yacc solution could not help me trace functionality across functions and parse the structure of the entire function in one pass. It became unmaintainable in a short time and was abandoned.
使用 GNU 的 CFlow 这样的工具怎么样,它可以分析代码并生成调用图-graphs,这是 opengroup(手册页)所说的内容关于cflow。 GNU 版本的 cflow 带有源代码,并且也是开源的...
希望这会有所帮助,
此致,
汤姆.
What about using a tool like GNU's CFlow, that can analyse the code and produce charts of call-graphs, here's what the opengroup(man page) has to say about cflow. The GNU version of cflow comes with source, and open source also ...
Hope this helps,
Best regards,
Tom.