如何从源文件中提取单个函数

发布于 2024-07-28 00:25:20 字数 918 浏览 12 评论 0原文

我正在开展一项关于Linux 内核中极其长且复杂的函数。我试图弄清楚是否有充分的理由编写 600 或 800 行长的函数。

为此，我想找到一个可以从 .c 文件中提取函数的工具，这样我就可以对该函数运行一些自动化测试。

例如，如果我有函数 cifs_parse_mount_options () 文件 connect.c 中，我正在寻找一个大致工作原理如下的解决方案：

extract /fs/cifs/connect.c cifs_parse_mount_options

并返回该函数的 523 行代码（！），从左大括号到右大括号。

当然，任何操作现有软件包的方法，例如 gcc 来做到这一点，也会很有帮助的。

谢谢，

乌迪

编辑：正则表达式的答案提取 C 函数原型声明？让我相信，通过正则表达式匹配函数声明绝非易事。

原文

I'm working on a small academic research about extremely long and complicated functions in the Linux kernel. I'm trying to figure out if there is a good reason to write 600 or 800 lines-long functions.

For that purpose, I would like to find a tool that can extract a function from a .c file, so I can run some automated tests on the function.

For example, If I have the function cifs_parse_mount_options() within the file connect.c, I'm seeking a solution that would roughly work like:

extract /fs/cifs/connect.c cifs_parse_mount_options

and return the 523 lines of code(!) of the function, from the opening braces to the closing braces.

Of course, any way of manipulating existing software packages like gcc to do that, would be most helpful too.

Thanks,

Udi

EDIT : The answers to Regex to pull out C function prototype declarations? convinced me that matching function declaration by regex is far from trivial.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

东京女 2024-08-04 00:31:41

我有类似的需求，要从 C 代码中提取一个函数，我发现 vim （编辑器）适合我的需求（而且更容易一些），因为我不必编写任何外部工具或依赖不可靠的正则表达式，这可能会变得乏味。

测试代码：

$ cat -n c.c
   1 #include <stdio.h>
   2 static int
   3 testme (void)
   4 {
   5     int i=1;
   6 
   7     if (i == 1) {
   8           printf("\nDo something\n");
   9     }
  10     return 0;
  11 }
  12 
  13 int main (int argc, char *argv[])
  14 {
  15     testme();
  16     return 0;
  17 }

在非交互式 (ex) 模式下使用 vim 和 -es：

步骤 1 - 转到函数的开头vim 搜索（假设函数名称位于行的开头，后跟一个空格 - +/ 并打印行号 - ! echo line(".")

.2 - 移动到行首的下一个右大括号 - +/} 并打印行号

步骤 3 - 退出文件 - +q

步骤 4 - 现在我们有了 start-line# 和 end-line# - 我们将以 ,p 的形式将其通过管道传输到 sed（在之前需要使用 paste 进行一些操作）调用 sed) 转储整个函数

：

$ vim -es c.c +/'testme ' +'exec(":!echo ".line("."))'  +'/^}'  +'exec(":!echo ".line("."))'  +q | paste -sd "," - | xargs -i{} sed -n {}p c.c
testme (void)
{
    int i=1;

    if (i == 1) {
          printf("\nDo something\n");
    }
    return 0;
}

I had a similar need, to pull out a function from C code, I found vim (the editor) to be suited for my needs (and a bit easier) because I don't have to write any external tools or rely on unreliable regexes which can get tedious.

test code:

$ cat -n c.c
   1 #include <stdio.h>
   2 static int
   3 testme (void)
   4 {
   5     int i=1;
   6 
   7     if (i == 1) {
   8           printf("\nDo something\n");
   9     }
  10     return 0;
  11 }
  12 
  13 int main (int argc, char *argv[])
  14 {
  15     testme();
  16     return 0;
  17 }

Using vim in non-interactive (ex) mode with -es:

step.1 - go to the start of the function with vim search (assuming function name is at the start of the line followed by a space - +/<function-name> and print the line number - !echo line(".").

step.2 - move to the next closing brace at the start of line - +/} and print the line number

step.3 - exit file - +q

step.4 - Now that we have a start-line# and end-line# - we will pipe it to sed in the form <start>,<end>p (a little bit of massaging with paste required, before invoking sed) to dump the entire function.

Full command:

$ vim -es c.c +/'testme ' +'exec(":!echo ".line("."))'  +'/^}'  +'exec(":!echo ".line("."))'  +q | paste -sd "," - | xargs -i{} sed -n {}p c.c
testme (void)
{
    int i=1;

    if (i == 1) {
          printf("\nDo something\n");
    }
    return 0;
}

回复收藏 0 原文

椵侞 2024-08-04 00:30:48

您应该使用 clang 之类的东西，它实际上会解析您的源代码并允许您分析它。因此它可以找到多种语言的函数，甚至考虑宏。您没有机会使用正则表达式。

回复收藏 0 原文

嗫嚅 2024-08-04 00:29:57

Bash 内置的 declare 似乎提供了类似的功能，但我不确定它是如何实现的。特别是，declare -f列出了当前环境中的函数：

declare -f quote
declare -f quote_readline

declare输出当前环境中的函数列表：

quote () 
{ 
    local quoted=${1//\'/\'\\\'\'};
    printf "'%s'" "$quoted"
}
quote_readline () 
{ 
    local ret;
    _quote_readline_by_ref "$1" ret;
    printf %s "$ret"
}

最后，declare -f quote 输出 quote 函数的函数定义。

quote () 
{ 
    local quoted=${1//\'/\'\\\'\'};
    printf "'%s'" "$quoted"
}

也许可以重新调整底层机器的用途来满足您的需求。

Bash builtin declare appears to provide similar functionality, but I am not sure how it is implemented. In particular, declare -f lists the functions in the present environment:

declare -f quote
declare -f quote_readline

declare outputs the list of functions in the present environment:

quote () 
{ 
    local quoted=${1//\'/\'\\\'\'};
    printf "'%s'" "$quoted"
}
quote_readline () 
{ 
    local ret;
    _quote_readline_by_ref "$1" ret;
    printf %s "$ret"
}

Finally, declare -f quote outputs the function definition for the quote function.

quote () 
{ 
    local quoted=${1//\'/\'\\\'\'};
    printf "'%s'" "$quoted"
}

Perhaps the underlying machinery can be repurposed to meet your needs.

回复收藏 0 原文

离笑几人歌 2024-08-04 00:29:14

如果您发现很难提取函数名称：

1> 使用 ctags（一个程序）提取函数名称。
ctags -x --c-kinds=fp 文件路径。
2> 一旦获得函数名称，就编写一个简单的 Perl 脚本，通过传递函数的脚本名称来提取函数的内容，如上所述。

回复收藏 0 原文

深海里的那抹蓝 2024-08-04 00:28:30

indent -kr code -o code.out

awk -f split.awk code.out

您必须稍微适应 split.awk ，它在某种程度上特定于我的代码和重构需求（例如 y 有这样的结构，它们不是 typedefs

并且我相信你可以制作一个更好的脚本:-)

--
BEGIN   { line=0; FS="";
    out=ARGV[ARGC-1]  ".out";
    var=ARGV[ARGC-1]  ".var";
    ext=ARGV[ARGC-1]  ".ext";
    def=ARGV[ARGC-1]  ".def";
    inc=ARGV[ARGC-1]  ".inc";
    typ=ARGV[ARGC-1]  ".typ";
    system ( rm " " -f " " out " " var " " ext " " def " " inc " " typ );
    }
/^[     ]*\/\/.*/   { print "comment :" $0 "\n"; print $0 >> out ; next ;}
/^#define.*/        { print "define :" $0 ; print $0 >>def ; next;}
/^#include.*/       { print "define :" $0 ; print $0 >>inc ; next;}
/^typedef.*{$/      { print "typedef var :" $0 "\n"; decl="typedef";print $0 >> typ;infile="typ";next;}
/^extern.*$/        { print "extern :" $0 "\n"; print $0 >> ext;infile="ext";next;}
/^[^    }].*{$/     { print "init var :" $0 "\n";decl="var";print $0 >> var; infile="vars";
                print $0;
                fout=gensub("^([^    \\*])*[    ]*([a-zA-A0-9_]*)\\[.*","\\2","g") ".vars";
                     print "var decl : " $0 "in file " fout;
                     print $0 >fout;
                next;
                        }
/^[^    }].*)$/     { print "func  :" $0 "\n";decl="func"; infile="func";
                print $0;
                fout=gensub("^.*[    \\*]([a-zA-A0-9_]*)[   ]*\\(.*","\\1","g") ".func";
                     print "function : " $0 "in file " fout;
                     print $0 >fout;
                next;
            }
/^}[    ]*$/        { print "end of " decl ":" $0 "\n"; 
                if(infile=="typ") {
                    print $0 >> typ;
                }else if (infile=="ext"){
                    print $0 >> ext;
                }else if (infile=="var") {
                    print $0 >> var;
                }else if ((infile=="func")||(infile=="vars")) {
                    print $0 >> fout; 
                    fflush (fout);
                    close (fout);
                }else if (infile=="def") {
                    print $0 >> def;
                }else if (infile=="inc"){
                    print $0 >> inc;
                }else print $0 >> out;
                next;
            }
/^[a-zA-Z_]/        { print "extern :" $0 "\n"; print $0 >> var;infile="var";next;}
            { print "other :" $0 "\n" ; 
                if(infile=="typ") {
                    print $0 >> typ;
                }else if (infile=="ext"){
                    print $0 >> ext;
                }else if (infile=="var") {
                    print $0 >> var;
                }else if ((infile=="func")||(infile=="vars")){
                    print $0 >> fout;
                }else if (infile=="def") {
                    print $0 >> def;
                }else if (infile=="inc"){
                    print $0 >> inc;
                }else print $0 >> out;
               next;
               }

indent -kr code -o code.out

awk -f split.awk code.out

you have to adapt a little bit split.awk wich is somewhat specific to my code and refactoring needs (for example y have so struct who are not typedefs

And I'm sure you can make a nicer script :-)

--
BEGIN   { line=0; FS="";
    out=ARGV[ARGC-1]  ".out";
    var=ARGV[ARGC-1]  ".var";
    ext=ARGV[ARGC-1]  ".ext";
    def=ARGV[ARGC-1]  ".def";
    inc=ARGV[ARGC-1]  ".inc";
    typ=ARGV[ARGC-1]  ".typ";
    system ( rm " " -f " " out " " var " " ext " " def " " inc " " typ );
    }
/^[     ]*\/\/.*/   { print "comment :" $0 "\n"; print $0 >> out ; next ;}
/^#define.*/        { print "define :" $0 ; print $0 >>def ; next;}
/^#include.*/       { print "define :" $0 ; print $0 >>inc ; next;}
/^typedef.*{$/      { print "typedef var :" $0 "\n"; decl="typedef";print $0 >> typ;infile="typ";next;}
/^extern.*$/        { print "extern :" $0 "\n"; print $0 >> ext;infile="ext";next;}
/^[^    }].*{$/     { print "init var :" $0 "\n";decl="var";print $0 >> var; infile="vars";
                print $0;
                fout=gensub("^([^    \\*])*[    ]*([a-zA-A0-9_]*)\\[.*","\\2","g") ".vars";
                     print "var decl : " $0 "in file " fout;
                     print $0 >fout;
                next;
                        }
/^[^    }].*)$/     { print "func  :" $0 "\n";decl="func"; infile="func";
                print $0;
                fout=gensub("^.*[    \\*]([a-zA-A0-9_]*)[   ]*\\(.*","\\1","g") ".func";
                     print "function : " $0 "in file " fout;
                     print $0 >fout;
                next;
            }
/^}[    ]*$/        { print "end of " decl ":" $0 "\n"; 
                if(infile=="typ") {
                    print $0 >> typ;
                }else if (infile=="ext"){
                    print $0 >> ext;
                }else if (infile=="var") {
                    print $0 >> var;
                }else if ((infile=="func")||(infile=="vars")) {
                    print $0 >> fout; 
                    fflush (fout);
                    close (fout);
                }else if (infile=="def") {
                    print $0 >> def;
                }else if (infile=="inc"){
                    print $0 >> inc;
                }else print $0 >> out;
                next;
            }
/^[a-zA-Z_]/        { print "extern :" $0 "\n"; print $0 >> var;infile="var";next;}
            { print "other :" $0 "\n" ; 
                if(infile=="typ") {
                    print $0 >> typ;
                }else if (infile=="ext"){
                    print $0 >> ext;
                }else if (infile=="var") {
                    print $0 >> var;
                }else if ((infile=="func")||(infile=="vars")){
                    print $0 >> fout;
                }else if (infile=="def") {
                    print $0 >> def;
                }else if (infile=="inc"){
                    print $0 >> inc;
                }else print $0 >> out;
               next;
               }

回复收藏 0 原文

仙女 2024-08-04 00:27:52

为什么不编写一个小型 PERL/PHP/Python 脚本，甚至一个小型 C++、Java 或 C# 程序来执行此操作？

我不知道有任何现成的工具可以做到这一点，但编写代码来解析文本文件并从 C++ 代码文件中提取函数体不应超过 20 行代码。唯一的困难的部分是定位函数的开头，使用 RegEx 这应该是一个相对简单的任务。之后，您所需要做的就是迭代文件的其余部分，跟踪左大括号和右大括号，当您到达函数体右大括号时，您就完成了。

回复收藏 0 原文

~没有更多了~