在 shell 脚本中优化 grep（或使用 AWK）

发布于 2024-09-01 11:32:10 字数 368 浏览 5 评论 0原文

在我的 shell 脚本中，我尝试使用 $sourcefile 中找到的术语一遍又一遍地针对同一 $targetfile 进行搜索。

我的 $sourcefile 的格式如下：

pattern1
pattern2
etc...

我必须搜索的低效循环是：

for line in $(< $sourcefile);do
    fgrep $line $targetfile | fgrep "RID" >> $outputfile
done

我知道可以通过将整个 $targetfile 加载到内存中或使用 AWK 来改进这一点？

谢谢

原文

In my shell script, I am trying to search using terms found in a $sourcefile against the same $targetfile over and over.

My $sourcefile is formatted as such:

pattern1
pattern2
etc...

The inefficient loop I have to search with is:

for line in $(< $sourcefile);do
    fgrep $line $targetfile | fgrep "RID" >> $outputfile
done

I understand it would be possible to improve this by either loading the whole $targetfile into memory, or perhaps by using AWK?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

会发光的星星闪亮亮i 2024-09-08 11:32:10

我是否遗漏了什么，或者为什么不只是 fgrep -f "$sourcefile" "$targetfile" ？

回复收藏 0 原文

别靠近我心 2024-09-08 11:32:10

sed 解决方案：

sed 's/$.*$/\/\1\/p/' $sourcefile | sed -nf - $targetfile

这将 $sourcefile 的每一行转换为 sed 模式匹配命令：

匹配字符串

到

/匹配字符串/p

但是，您需要转义特殊字符才能使其健壮。

回复收藏 0 原文

戏剧牡丹亭 2024-09-08 11:32:10

使用 awk 读取源文件，然后在目标文件中搜索（未经测试）：

nawk '
    NR == FNR {patterns[$0]++; next}
    /RID/ {
        for (pattern in patterns) {
            # since fgrep considers patterns as strings not regular expressions, 
            # use string lookup and not pattern matching ("~" operator).
            if (index($0, pattern) > 0) {
                print
                break
            }
        }
    }
' "$sourcefile" "$targetfile" > "$outputfile"

也将与 gawk 一起使用。

Using awk to read in the sourcefile then searching in targetfile (untested):

nawk '
    NR == FNR {patterns[$0]++; next}
    /RID/ {
        for (pattern in patterns) {
            # since fgrep considers patterns as strings not regular expressions, 
            # use string lookup and not pattern matching ("~" operator).
            if (index($0, pattern) > 0) {
                print
                break
            }
        }
    }
' "$sourcefile" "$targetfile" > "$outputfile"

Will also with with gawk.

回复收藏 0 原文

~没有更多了~