GREP - 查找字符串的所有出现位置

发布于 2024-08-12 01:35:34 字数 718 浏览 4 评论 0原文

我的任务是对应用程序进行白色标记,以便它不包含对我们公司、网站等的引用。我遇到的问题是我有许多不同的模式需要查找,并且希望保证所有模式都被删除。由于该应用程序不是内部开发的(完全),我们不能简单地查找 messages.properties 中的出现并完成。我们必须检查 JSP、Java 代码和 xml。

我使用 grep 来过滤这样的结果:

grep SOME_PATTERN . -ir | grep -v import | grep -v // | grep -v /* ...

当我在命令行上使用它们时,这些模式会被转义;但是,我不认为这种模式匹配非常强大。可能会出现导入(不太可能)甚至 /* (javadoc 注释的开头)的情况。

输出到屏幕的所有文本都必须来自某处的字符串声明或常量文件。因此,我可以假设我会找到类似的内容:

public static final String SOME_CONSTANT = "SOME_PATTERN is currently unavailable";

我想找到该事件以及:

public static final String SOME_CONSTANT = "
SOME_PATTERN blah blah blah";

或者,如果我们有内部爬虫/自动化测试,我可以简单地从每个页面拉回 xhtml 并检查源代码以确保很干净。

I am tasked with white labeling an application so that it contains no references to our company, website, etc. The problem I am running into is that I have many different patterns to look for and would like to guarantee that all patterns are removed. Since the application was not developed in-house (entirely) we cannot simply look for occurrences in messages.properties and be done. We must go through JSP's, Java code, and xml.

I am using grep to filter results like this:

grep SOME_PATTERN . -ir | grep -v import | grep -v // | grep -v /* ...

The patterns are escaped when I'm using them on the command line; however, I don't feel this pattern matching is very robust. There could possibly be occurrences that have import in them (unlikely) or even /* (the beginning of a javadoc comment).

All of the text output to the screen must come from a string declaration somewhere or a constants file. So, I can assume I will find something like:

public static final String SOME_CONSTANT = "SOME_PATTERN is currently unavailable";

I would like to find that occurrence as well as:

public static final String SOME_CONSTANT = "
SOME_PATTERN blah blah blah";

Alternatively, if we had an internal crawler / automated tests, I could simply pull back the xhtml from each page and check the source to ensure it was clean.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

赠我空喜 2024-08-19 01:35:34

为了解决您对丢失某些事件的担忧,为什么不逐步过滤:

  1. 创建一个包含所有可能的文本文件
    匹配作为起点。
  2. 使用过滤器 X (grep for '^import',
    例如)转储可能的错误
    将正数写入 tmp 文件。
  3. 再次使用过滤器 X 去除这些
    来自您的工作文件的匹配项(a
    [1] 的副本)。
  4. 对 tmp 进行快速视觉传递
    文件并添加所有真实的匹配项
    in.
  5. 对其他过滤器重复 [2]-[4]。

当然,这可能需要一些时间,但听起来这不是你想出错的事情......

To address your concern about missing some occurrences, why not filter progressively:

  1. Create a text file with all possible
    matches as a starting point.
  2. Use filter X (grep for '^import',
    for example) to dump probable false
    positives into a tmp file.
  3. Use filter X again to remove those
    matches from your working file (a
    copy of [1]).
  4. Do a quick visual pass of the tmp
    file and add any real matches back
    in.
  5. Repeat [2]-[4] with other filters.

This might take some time, of course, but it doesn't sound like this is something you want to get wrong...

初懵 2024-08-19 01:35:34

我会使用sed,而不是grep
Sed 用于对输入流执行基本文本转换。
尝试使用 sed 命令的 s/regexp/replacement/ 选项。

您还可以尝试awk命令。它有一个用于字段分隔的选项-F,您可以将其与;一起使用,以用;分隔文件的行。

然而,最好的解决方案是使用 PerlPython 编写一个简单的脚本。

I would use sed, not grep!
Sed is used to perform basic text transformations on an input stream.
Try s/regexp/replacement/ option with sed command.

You can also try awk command. It has an option -F for fields separation, you can use it with ; to separate lines of you files with ;.

The best solution will be however a simple script in Perl or in Python.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文