解析 C 和 C++ 中的 typedef

发布于 2024-08-19 08:21:38 字数 615 浏览 6 评论 0原文

我正在尝试自动解析任意 C++ 或 C 项目中的 typedef。

由于某些 typedef 是在系统头文件中定义的(例如 uint32),因此我目前正在尝试通过在代码文件上运行 gcc 预处理器,然后扫描预处理文件中的 typedef 来实现此目的。然后我应该能够替换项目代码文件中的 typedef。

我想知道是否还有另一种,也许更简单的方法,我失踪了。你能想到一个吗?

我想这样做的原因是:我正在使用不同的工具从 C/C++ 项目中提取代码指标。这些指标是基于方法的。提取指标后,我必须合并由不同工具生成的数据。问题是,其中一个工具可以解析 typedef,而其他工具则不能。如果有用于方法的参数类型的 typedef,我会将指标映射到不同的方法名称,这些名称实际上引用源代码中的相同方法。

想想源代码中的这个方法:int test(uint32 par1, int par2)
运行我的工具后,我有指标,映射到名为 int test(uint32 par1, int par2) 的方法,并且我的一些指标映射到 int test(unsigned int par1, int par2)

I'm trying to automatically resolve typedefs in arbitrary C++ or C projects.

Because some of the typedefs are defined in system header files (for example uint32), I'm currently trying to achieve this by running the gcc preprocessor on my code files and then scanning the preprocessed files for typedefs. I should then be able to replace the typedefs in the project's code files.

I'm wondering, if there is another, perhaps simpler way, I'm missing. Can you think of one?

The reason, why I want to do this: I'm extracting code metrics from the C/C++ projects with different tools. The metrics are method-based. After extracting the metrics, I have to merge the data, that is produced by the different tools. The problem is, that one of the tools resolves typedefs and others don't. If there are typedefs used for the parameter types of methods, I have metrics mapped to different method-names, which are actually referring to the same method in the source code.

Think of this method in the source code: int test(uint32 par1, int par2)
After running my tools I have metrics, mapped to a method named int test(uint32 par1, int par2) and some of my metrics are mapped to int test(unsigned int par1, int par2).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

辞慾 2024-08-26 08:21:38

如果您不关心弄清楚它们的定义位置,可以使用 objdump 转储解析 typedef 的 C++ 符号表。

lorien$ objdump --demangle --syms foo

foo:     file format mach-o-i386

SYMBOL TABLE:
00001a24 g       1e SECT   01 0000 .text dyld_stub_binding_helper
00001a38 g       1e SECT   01 0000 .text _dyld_func_lookup
...
00001c7c g       0f SECT   01 0080 .text foo::foo(char const*)
...

该片段来自以下结构定义:

typedef char const* c_string;
struct foo {
    typedef c_string ntcstring;
    foo(ntcstring s): buf(s) {}
    std::string buf;
};

这确实要求您编译所有内容,并且它只会在生成的可执行文件中显示符号,因此存在一些限制。另一种选择是让链接器转储符号映射。对于 GNU 工具,请添加 -Wl,-map-Wl,name,其中 name 是要生成的文件的名称(请参阅注释)。这种方法不会对名称进行破坏,但只需做一些工作,您就可以对编译器的破坏约定进行逆向工程。上一个代码片段的输出将包括类似以下内容:

0x00001CBE  0x0000005E  [  2] __ZN3fooC2EPKc
0x00001D1C  0x0000001A  [  2] __ZN3fooC1EPKc

您可以使用 C++ 解码这些内容ABI 规范。一旦您熟悉了它的工作原理,改编表包含在 ABI 中变得无价。本例中的推导是:

<mangled-name>           ::= '_Z' <encoding>
<encoding>               ::= <name> <bare-function-type>
  <name>                 ::= <nested-name>
    <nested-name>        ::= 'N' <source-name> <ctor-dtor-name> 'E'
      <source-name>      ::= <number> <identifier>
      <ctor-dtor-name>   ::= 'C2' # base object constructor
    <bare-function-type> ::= <type>+
      <type>             ::= 'P' <type> # pointer to
        <type>           ::= <cv-qualifier> <type>
          <cv-qualifier> ::= 'K' # constant
            <type>       ::= 'c' # character

注意:看起来 GNU 将参数更改为 ld,因此您可能需要检查本地手册 (man ld) 以确保地图文件生成命令在您的版本中为 -mapfilename。在最近的版本中,使用 -Wl,-M 和将标准输出重定向到文件

If you do not care about figuring out where they are defined, you can use objdump to dump the C++ symbol table which resolves typedefs.

lorien$ objdump --demangle --syms foo

foo:     file format mach-o-i386

SYMBOL TABLE:
00001a24 g       1e SECT   01 0000 .text dyld_stub_binding_helper
00001a38 g       1e SECT   01 0000 .text _dyld_func_lookup
...
00001c7c g       0f SECT   01 0080 .text foo::foo(char const*)
...

This snippet is from the following structure definition:

typedef char const* c_string;
struct foo {
    typedef c_string ntcstring;
    foo(ntcstring s): buf(s) {}
    std::string buf;
};

This does require that you compile everything and it will only show symbols in the resulting executable so there are a few limitations. The other option is to have the linker dump a symbol map. For GNU tools add -Wl,-map and -Wl,name where name is the name of the file to generate (see note). This approach does not demangle the names, but with a little work you can reverse engineer the compiler's mangling conventions. The output from the previous snippet will include something like:

0x00001CBE  0x0000005E  [  2] __ZN3fooC2EPKc
0x00001D1C  0x0000001A  [  2] __ZN3fooC1EPKc

You can decode these using the C++ ABI specification. Once you get comfortable with how this works, the mangling table included with the ABI becomes priceless. The derivation in this case is:

<mangled-name>           ::= '_Z' <encoding>
<encoding>               ::= <name> <bare-function-type>
  <name>                 ::= <nested-name>
    <nested-name>        ::= 'N' <source-name> <ctor-dtor-name> 'E'
      <source-name>      ::= <number> <identifier>
      <ctor-dtor-name>   ::= 'C2' # base object constructor
    <bare-function-type> ::= <type>+
      <type>             ::= 'P' <type> # pointer to
        <type>           ::= <cv-qualifier> <type>
          <cv-qualifier> ::= 'K' # constant
            <type>       ::= 'c' # character

Note: it looks like GNU changes the arguments to ld so you may want to check your local manual (man ld) to make sure that the map file generation commands are -mapfilename in your version. In recent versions, use -Wl,-M and redirect stdout to a file.

浅唱ヾ落雨殇 2024-08-26 08:21:38

您可以使用 Clang(LLVM C/C++ 编译器前端)以保留 typedef 甚至宏信息的方式解析代码。它有一个非常好的 C++ API,用于在将源代码读入 AST(抽象语法树)后读取数据。 http://clang.llvm.org/

如果您正在寻找一个已经执行以下操作的简单程序为你解决(而不是 Clang 编程 API),我认为你运气不好,因为我从来没有见过这样的事情。

You can use Clang (the LLVM C/C++ compiler front-end) to parse code in a way that preserves information on typedefs and even macros. It has a very nice C++ API for reading the data after the source code is read into the AST (abstract syntax tree). http://clang.llvm.org/

If you are instead looking for a simple program that already does the resolving for you (instead of the Clang programming API), I think you are out of luck, as I have never seen such a thing.

傲鸠 2024-08-26 08:21:38

GCC-XML 可以帮助解析 typedef,您必须遵循类型 元素的 -id,直到将它们解析为 < Class> 元素。

为了替换项目中的 typedef,您会遇到一个更基本的问题:您不能简单地搜索和替换,因为您必须尊重名称的范围 - 考虑例如函数本地 typedef、命名空间别名或 using 指令。

根据您实际想要实现的目标,必须有更好的方法。

更新:实际上,在修复指标数据的给定上下文中,如果支持您的代码库,使用 gcc-xml 替换类型名应该可以正常工作。

GCC-XML can help with resolving the typedefs, you'd have to follow the type-ids of <Typedef> elements until you resolved them to a <FundamentalType>, <Struct> or <Class> element.

For replacing the typedefs in your project you have a more fundamental problem though: you can't simply search and replace as you'd have to respect the scope of names - think of e.g. function-local typedefs, namespace aliases or using directives.

Depending on what you're actually trying to achieve, there has to be a better way.

Update: Actually, in the given context of fixing metrics data, the replacement for the typenames using gcc-xml should work fine if it supports your code-base.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文