在文件夹中的 gzip 压缩文件中查找字符串

发布于 2024-08-01 13:54:22 字数 399 浏览 2 评论 0原文

我当前的问题是我有大约 10 个文件夹,其中包含 gzip 压缩文件(平均每个文件夹大约 5 个)。 这使得需要打开和查看 50 个文件。

是否有更简单的方法来确定文件夹内的 gzip 压缩文件是否具有特定模式?

zcat ABC/myzippedfile1.txt.gz | grep "pattern match"
zcat ABC/myzippedfile2.txt.gz | grep "pattern match"

我可以在一行中为所有文件夹和子文件夹执行相同的操作,而不是编写脚本吗?

for f in `ls *.gz`; do echo $f; zcat $f | grep <pattern>; done;

My current problem is that I have around 10 folders, which contain gzipped files (around on an average 5 each). This makes it 50 files to open and look at.

Is there a simpler method to find out if a gzipped file inside a folder has a particular pattern or not?

zcat ABC/myzippedfile1.txt.gz | grep "pattern match"
zcat ABC/myzippedfile2.txt.gz | grep "pattern match"

Instead of writing a script, can I do the same in a single line, for all the folders and sub folders?

for f in `ls *.gz`; do echo $f; zcat $f | grep <pattern>; done;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

亣腦蒛氧 2024-08-08 13:54:22

你可以使用这个命令 -

zgrep "foo" $(find . -name "*.gz")

You can use this command -

zgrep "foo" $(find . -name "*.gz")
戴着白色围巾的女孩 2024-08-08 13:54:22

<代码>查找 . -name "*.gz"|xargs zcat | grep "pattern" 应该可以。

find . -name "*.gz"|xargs zcat | grep "pattern" should do.

血之狂魔 2024-08-08 13:54:22

zgrep "string" ./*/*

您可以使用上面的命令在 dir 目录的 .gz 文件中搜索 string,其中 dir 有以下子目录结构:

/dir
    /childDir1
              /file1.gz
              /file2.gz
    /childDir2
              /file3.gz
              /file4.gz
    /childDir3
              /file5.gz
              /file6.gz

zgrep "string" ./*/*

You can use above command to search for string in .gz files of dir directory where dir has following sub-directories structure:

/dir
    /childDir1
              /file1.gz
              /file2.gz
    /childDir2
              /file3.gz
              /file4.gz
    /childDir3
              /file5.gz
              /file6.gz
韬韬不绝 2024-08-08 13:54:22

来得有点晚了,有类似的问题,并且能够解决使用;

zcat -r /some/dir/here | grep "blah"

详细信息请参见此处;

http://manpages.ubuntu.com/manpages/quantal/man1/gzip .1.html

但是,这不会显示结果匹配的原始文件,而是显示来自管道的“(标准输入)”。 zcat 似乎也不支持输出名称。

就性能而言,这就是我们得到的;

$ alias dropcache="sync && echo 3 > /proc/sys/vm/drop_caches"

$ find 09/01 | wc -l
4208

$ du -chs 09/01
24M

$ dropcache; time zcat -r 09/01 > /dev/null
real    0m3.561s

$ dropcache; time find 09/01 -iname '*.txt.gz' -exec zcat '{}' \; > /dev/null
0m38.041s

正如您所看到的,即使在处理少量文件时,使用 find|zcat 方法也比使用 zcat -r 慢得多。 我也无法让 zcat 输出文件名(使用 -v 显然会输出文件名,但不是每一行)。 目前看来,还没有一种工具能够同时提供 grep 的速度和名称一致性(即 -H 选项)。

如果您需要识别结果所属的文件的名称,那么您需要编写自己的工具(可以用 50 行 Python 代码完成)或使用较慢的方法。 如果不需要识别名称,则使用zcat -r

希望这可以帮助

Coming in a bit late on this, had a similar problem and was able to resolve using;

zcat -r /some/dir/here | grep "blah"

As detailed here;

http://manpages.ubuntu.com/manpages/quantal/man1/gzip.1.html

However, this does not show the original file that the result matched from, instead showing "(standard input)" as it's coming in from a pipe. zcat does not seem to support outputting a name either.

In terms of performance, this is what we got;

$ alias dropcache="sync && echo 3 > /proc/sys/vm/drop_caches"

$ find 09/01 | wc -l
4208

$ du -chs 09/01
24M

$ dropcache; time zcat -r 09/01 > /dev/null
real    0m3.561s

$ dropcache; time find 09/01 -iname '*.txt.gz' -exec zcat '{}' \; > /dev/null
0m38.041s

As you can see, using the find|zcat method is significantly slower than using zcat -r when dealing with even a small volume of files. I was also unable to make zcat output the file name (using -v will apparently output the filename, but not on every single line). It would appear that there isn't currently a tool that will provide both speed and name consistency with grep (i.e. the -H option).

If you need to identify the name of the file that the result belongs to, then you'll need to either write your own tool (could be done in 50 lines of Python code) or use the slower method. If you do not need to identify the name, then use zcat -r.

Hope this helps

木有鱼丸 2024-08-08 13:54:22

使用 find 命令

find . -name "*.gz" -exec zcat "{}" + |grep "test"

或尝试使用 zcat 的递归选项(-r)

use the find command

find . -name "*.gz" -exec zcat "{}" + |grep "test"

or try using the recursive option (-r) of zcat

无可置疑 2024-08-08 13:54:22

zgrep 如何不支持 -R

我认为“Nietzche-jou”的解决方案可能是一个更好的答案,但我会添加选项 -H 来显示文件名,如下所示

find . -name "*.gz" -exec zgrep -H 'PATTERN' \{\} \;

how zgrep don't support -R

I think the solution of "Nietzche-jou" could be a better answer, but I would add the option -H to show the file name something like this

find . -name "*.gz" -exec zgrep -H 'PATTERN' \{\} \;
满地尘埃落定 2024-08-08 13:54:22

这里不需要zcat,因为有zgrepzegrep。

如果你想在目录层次结构上运行命令,你可以使用< em>find:

find . -name "*.gz" -exec zgrep ⟨pattern⟩ \{\} \;

而且“ls *.gz”在for中是没用的,以后你应该只使用“*.gz”。

You don't need zcat here because there is zgrep and zegrep.

If you want to run a command over a directory hierarchy, you use find:

find . -name "*.gz" -exec zgrep ⟨pattern⟩ \{\} \;

And also “ls *.gz” is useless in for and you should just use “*.gz” in the future.

雄赳赳气昂昂 2024-08-08 13:54:22

zgrep 将查找 gzip 压缩文件,具有 -R 递归选项和 -H 显示文件名选项:

zgrep -R --include=*.gz -H "pattern match" .

操作系统特定命令,因为并非所有参数都适用:

Mac 10.5+: zgrep -R --include=\*.gz -H“模式匹配”。

Ubuntu 16+: zgrep -i -H“模式匹配”*.gz

zgrep will look in gzipped files, has a -R recursive option, and a -H show me the filename option:

zgrep -R --include=*.gz -H "pattern match" .

OS specific commands as not all arguments work across the board:

Mac 10.5+: zgrep -R --include=\*.gz -H "pattern match" .

Ubuntu 16+: zgrep -i -H "pattern match" *.gz

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文