GREP - 查找字符串的所有出现位置
我的任务是对应用程序进行白色标记,以便它不包含对我们公司、网站等的引用。我遇到的问题是我有许多不同的模式需要查找,并且希望保证所有模式都被删除。由于该应用程序不是内部开发的(完全),我们不能简单地查找 messages.properties 中的出现并完成。我们必须检查 JSP、Java 代码和 xml。
我使用 grep 来过滤这样的结果:
grep SOME_PATTERN . -ir | grep -v import | grep -v // | grep -v /* ...
当我在命令行上使用它们时,这些模式会被转义;但是,我不认为这种模式匹配非常强大。可能会出现导入(不太可能)甚至 /* (javadoc 注释的开头)的情况。
输出到屏幕的所有文本都必须来自某处的字符串声明或常量文件。因此,我可以假设我会找到类似的内容:
public static final String SOME_CONSTANT = "SOME_PATTERN is currently unavailable";
我想找到该事件以及:
public static final String SOME_CONSTANT = "
SOME_PATTERN blah blah blah";
或者,如果我们有内部爬虫/自动化测试,我可以简单地从每个页面拉回 xhtml 并检查源代码以确保很干净。
I am tasked with white labeling an application so that it contains no references to our company, website, etc. The problem I am running into is that I have many different patterns to look for and would like to guarantee that all patterns are removed. Since the application was not developed in-house (entirely) we cannot simply look for occurrences in messages.properties and be done. We must go through JSP's, Java code, and xml.
I am using grep to filter results like this:
grep SOME_PATTERN . -ir | grep -v import | grep -v // | grep -v /* ...
The patterns are escaped when I'm using them on the command line; however, I don't feel this pattern matching is very robust. There could possibly be occurrences that have import in them (unlikely) or even /* (the beginning of a javadoc comment).
All of the text output to the screen must come from a string declaration somewhere or a constants file. So, I can assume I will find something like:
public static final String SOME_CONSTANT = "SOME_PATTERN is currently unavailable";
I would like to find that occurrence as well as:
public static final String SOME_CONSTANT = "
SOME_PATTERN blah blah blah";
Alternatively, if we had an internal crawler / automated tests, I could simply pull back the xhtml from each page and check the source to ensure it was clean.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为了解决您对丢失某些事件的担忧,为什么不逐步过滤:
匹配作为起点。
例如)转储可能的错误
将正数写入 tmp 文件。
来自您的工作文件的匹配项(a
[1] 的副本)。
文件并添加所有真实的匹配项
in.
当然,这可能需要一些时间,但听起来这不是你想出错的事情......
To address your concern about missing some occurrences, why not filter progressively:
matches as a starting point.
for example) to dump probable false
positives into a tmp file.
matches from your working file (a
copy of [1]).
file and add any real matches back
in.
This might take some time, of course, but it doesn't sound like this is something you want to get wrong...
我会使用sed,而不是grep!
Sed 用于对输入流执行基本文本转换。
尝试使用 sed 命令的 s/regexp/replacement/ 选项。
您还可以尝试awk命令。它有一个用于字段分隔的选项-F,您可以将其与;一起使用,以用;分隔文件的行。
然而,最好的解决方案是使用 Perl 或 Python 编写一个简单的脚本。
I would use sed, not grep!
Sed is used to perform basic text transformations on an input stream.
Try
s/regexp/replacement/
option with sed command.You can also try awk command. It has an option -F for fields separation, you can use it with ; to separate lines of you files with ;.
The best solution will be however a simple script in Perl or in Python.