We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 9 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(4)
C(以及大多数其他编程语言)中的所有标记都是“常规的”。 也就是说,它们可以通过正则表达式进行匹配。
C 字符串的正则表达式:
正则表达式并不难理解。 基本上,字符串文字是一对围绕一堆的双引号:
这是基于 C89/C90 规范的第 6.1.4 和 6.1.3.4 节。 如果 C99 中出现其他任何问题,这不会捕获该问题,但这应该不难修复。
这是一个用于过滤 C 源文件并删除字符串文字的 python 脚本:
编辑:
在我发布上述内容后,我想到虽然所有 C 标记都是常规的,但不标记我们的所有内容有机会惹麻烦了。 特别是,如果双引号出现在另一个标记中,我们可能会被引导到花园小路上。 您提到注释已经被删除,因此我们真正需要担心的唯一一件事是字符文字(尽管我要使用的方法也可以轻松扩展以处理注释)。 这是一个处理字符文字的更强大的脚本:
本质上,我们正在查找字符串和字符文字标记,然后单独保留 char 文字,但删除字符串文字。 字符正则表达式与字符串正则表达式非常相似。
All of the tokens in C (and most other programming languages) are "regular". That is, they can be matched by a regular expression.
A regular expression for C strings:
The regex isn't too hard to understand. Basically a string literal is a pair of double quotes surrounding a bunch of:
This is based on sections 6.1.4 and 6.1.3.4 of the C89/C90 spec. If anything else crept in in C99, this won't catch that, but that shouldn't be hard to fix.
Here's a python script to filter a C source file removing string literals:
EDIT:
It occurred to me after I posted the above that while it is true that all C tokens are regular, by not tokenizing everything we've got an opportunity for trouble. In particular, if a double quote shows up in what should be another token we can be lead down the garden path. You mentioned that comments have already been stripped, so the only other thing we really need to worry about are character literals (though the approach Im going to use can be easily extended to handle comments as well). Here's a more robust script that handles character literals:
Essentially we're finding string and character literal token, and then leaving char literals alone but stripping out string literals. The char literal regex is very similar to the string literal one.
您可以将源代码下载到StripCmt (. tar.gz - 5kB)。 它非常小,并且应该不会太难适应条带字符串(它是 发布的根据 GPL)。
您可能还想研究 C 字符串的官方词法语言规则。 我很快就找到了这个,但可能不会是确定的。 它将字符串定义为:
You can download the source code to StripCmt (.tar.gz - 5kB). It's trivially small, and shouldn't be too difficult to adapt to striping strings instead (it's released under the GPL).
You might also want to investigate the official lexical language rules for C strings. I found this very quickly, but it might not be definitive. It defines a string as:
在 Python 中使用 pyparsing:
也打印到 stdout。
In Python using pyparsing:
Also prints to stdout.
在 ruby 中:
打印到标准输出
In ruby:
prints to the standard output