如何使用 python、perl 或 sed 从头文件中提取注释?

发布于 2024-09-02 18:09:22 字数 3565 浏览 2 评论 0原文

我有一个像这样的头文件:

/*
 * APP 180-2 ALG-254/258/772 implementation
 * Last update: 03/01/2006
 * Issue date:  08/22/2004
 *
 * Copyright (C) 2006 Somebody's Name here
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. Neither the name of the project nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

#ifndef HEADER_H
#define HEADER_H

/* More comments and C++ code here. */

#endif /* End of file. */

我希望仅提取 first C 风格注释的内容,并在每行开头删除“*”以获取包含以下内容的文件:

 APP 180-2 ALG-254/258/772 implementation
 Last update: 03/01/2006
 Issue date:  08/22/2004

 Copyright (C) 2006 Somebody's Name here
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
 1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
 2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.
 3. Neither the name of the project nor the names of its contributors
    may be used to endorse or promote products derived from this software
    without specific prior written permission.

 THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
 FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.

请建议一种在 Unix 上使用 Python、Perl、sed 或其他方式执行此操作的简单方法。最好作为单线。

I have a header file like this:

/*
 * APP 180-2 ALG-254/258/772 implementation
 * Last update: 03/01/2006
 * Issue date:  08/22/2004
 *
 * Copyright (C) 2006 Somebody's Name here
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. Neither the name of the project nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

#ifndef HEADER_H
#define HEADER_H

/* More comments and C++ code here. */

#endif /* End of file. */

And I wish to extract out the contents of the first C style comment only and drop the " *" at the start of each line to get a file with the following contents:

 APP 180-2 ALG-254/258/772 implementation
 Last update: 03/01/2006
 Issue date:  08/22/2004

 Copyright (C) 2006 Somebody's Name here
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
 1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
 2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.
 3. Neither the name of the project nor the names of its contributors
    may be used to endorse or promote products derived from this software
    without specific prior written permission.

 THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
 FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.

Please suggest an easy way to do this with Python, Perl, sed, or some other way on Unix. Preferably as a one-liner.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夏了南城 2024-09-09 18:09:22

这应该对您有用:

sed -n '/\*\//q; /^\/\*/d; s/^ \* \?//p' <file.h >comment.txt

这是一个解释:sed(您可能知道)是一个命令,它遍历一个文件,将规则列表应用于每一行。每个规则都包含一个“选择器”和仅当选择器匹配时才应用于该行的命令。

第一条规则具有选择器 /\*\//。这是一个正则表达式选择器;它匹配任何包含字符 */ 的行。这两个都需要反斜杠转义,因为它们在正则表达式中具有特殊含义。 (我假设这只会匹配您的情况下注释的结束行,并且应该删除整行。)命令是 q ,意思是“退出”。 sed 就停止了。通常它会打印出该行,但我提供了 -n 选项,这意味着“除非明确指示,否则不要打印”。

第二条规则有选择器 /^\/\*/ ,它也是一个正则表达式选择器,与行开头的字符 /* 匹配。同样,我假设这一行不会包含部分注释。 d 命令告诉 sed 删除该行并继续。

最终规则没有选择器,因此它适用于所有行(除非先前的命令阻止处理到达最终规则)。最后一条规则中的命令是替换命令 s/PATTERN/REPLACMENT/,它在行中查找与某些模式匹配的文本,并将其替换为替换文本。这里的模式是 ^ \* \?,它匹配一个空格、一个星号以及 0 个或 1 个空格,但仅匹配行的开头。而且替换也没什么。那么 sed 只是删除前导空格-星号-(空格)?顺序。 p 实际上是替换命令的一个标志,告诉 sed 打印出替换结果。由于 -n 选项而需要它。

This should work for you:

sed -n '/\*\//q; /^\/\*/d; s/^ \* \?//p' <file.h >comment.txt

Here's an explanation: sed (as you may know) is a command that goes through a file applying a list of rules to each line. Each rule consists of a "selector" and commands that are applied to that line only if the selector matches.

The first rule has the selector /\*\//. This is a regular expression selector; it matches any line that contains the characters */. Both of these need to be backslash-escaped since they have special meanings in a regexp. (I've assumed that this will only match the closing line of the comment in your case and that this entire line should be deleted.) The command is q which means "quit." sed just stops. Ordinarily it would print out the line, but I provided the -n option which means "don't print unless explicitly instructed to."

The second rule has the selector /^\/\*/ which is again a regexp selector that matches the characters /* at the start of the line. Again, I've assumed this line will not contain part of the comment. The d command tells sed to delete this line and move on.

The final rule has no selector, so it applies to all lines (unless a previous command prevented processing from reaching the final rule). The command in this last rule is a substitution command, s/PATTERN/REPLACEMENT/, which finds text in the line that matches some pattern and replaces it with a replacement text. The pattern here is ^ \* \?, which matches a space, an asterisk, and either 0 or 1 spaces, but only at the beginning of the line. And the replacement is nothing. So sed simply deletes the leading space-asterisk-(space)? sequence. The p is actually a flag to the substitution command that tells sed to print out the result of the substitution. It's needed because of the -n option.

傻比既视感 2024-09-09 18:09:22

Pyparsing 包含一个内置模式,用于匹配各种语言的注释格式。使用 cStyleCommentscanString 查找源文件中的第一个注释,使其余部分只是字符串函数:

c_src = open(c_source_file).read()

from pyparsing import cStyleComment
cmt = cStyleComment.scanString(c_src).next()[0][0]
lines = [l[3:] for l in cmt.splitlines()]
print '\n'.join(lines)

scanString 是一个生成器,它返回之前的每个匹配项转到下一个实例,因此仅处理第一个评论。使用您的示例代码,将返回:

APP 180-2 ALG-254/258/772 implementation 
Last update: 03/01/2006 
Issue date:  08/22/2004 

Copyright (C) 2006 Somebody's Name here 
All rights reserved. 

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions 
are met: 
1. Redistributions of source code must retain the above copyright 
   notice, this list of conditions and the following disclaimer. 
2. Redistributions in binary form must reproduce the above copyright 
   notice, this list of conditions and the following disclaimer in the 
   documentation and/or other materials provided with the distribution. 
3. Neither the name of the project nor the names of its contributors 
   may be used to endorse or promote products derived from this software 
   without specific prior written permission. 

THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND 
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE 
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGE. 

Pyparsing includes a built-in pattern for matching comment formats from various languages. Using cStyleComment and scanString to find the first comment in the source file makes the rest just string functions:

c_src = open(c_source_file).read()

from pyparsing import cStyleComment
cmt = cStyleComment.scanString(c_src).next()[0][0]
lines = [l[3:] for l in cmt.splitlines()]
print '\n'.join(lines)

scanString is a generator that returns each match before going to the next instance, so only the first comment gets processed. With your sample code, this returns:

APP 180-2 ALG-254/258/772 implementation 
Last update: 03/01/2006 
Issue date:  08/22/2004 

Copyright (C) 2006 Somebody's Name here 
All rights reserved. 

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions 
are met: 
1. Redistributions of source code must retain the above copyright 
   notice, this list of conditions and the following disclaimer. 
2. Redistributions in binary form must reproduce the above copyright 
   notice, this list of conditions and the following disclaimer in the 
   documentation and/or other materials provided with the distribution. 
3. Neither the name of the project nor the names of its contributors 
   may be used to endorse or promote products derived from this software 
   without specific prior written permission. 

THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND 
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE 
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGE. 
风向决定发型 2024-09-09 18:09:22
sed -i -r "s/[\/\ ]{1}\*[\/\ ]?//g" YOURFILENAME

这将替换删除文件中的注释,保留内容。但这会修改​​ YOURFILENAME 文件。如果您不想这样做,请从行中删除 -i

sed -i -r "s/[\/\ ]{1}\*[\/\ ]?//g" YOURFILENAME

This replaces trims comments from your file, keeping the content. This will modify YOURFILENAME file though. If you don't want that remove -i from the line

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文