查找结果通过管道传输到 zcat,然后传输到 head

发布于 2024-09-11 11:57:11 字数 517 浏览 6 评论 0原文

我试图在很多 gzip 压缩的 csv 文件中搜索某个字符串,该字符串位于第一行,我的想法是通过组合 find、zcat 和 head 来获取每个文件的第一行。但我无法让他们一起工作。

$find . -name "*.gz" -print | xargs zcat -f | head -1
20051114083300,1070074.00,0.00000000
xargs: zcat: terminated by signal 13

example file:
$zcat 113.gz | head
20050629171845,1069335.50,-1.00000000
20050629171930,1069315.00,-1.00000000
20050629172015,1069382.50,-1.00000000
 .. and 2 milion rows like these ...

虽然我通过编写 bash 脚本、迭代文件并写入临时文件解决了这个问题,但很高兴知道我做错了什么、如何做以及是否有其他方法可以解决这个问题。

I'm trying to search for a certain string in a lot of gziped csv files, the string is located at the first row and my thought was to get the first row of each file by combining find, zcat and head. But I can't get them to work together.

$find . -name "*.gz" -print | xargs zcat -f | head -1
20051114083300,1070074.00,0.00000000
xargs: zcat: terminated by signal 13

example file:
$zcat 113.gz | head
20050629171845,1069335.50,-1.00000000
20050629171930,1069315.00,-1.00000000
20050629172015,1069382.50,-1.00000000
 .. and 2 milion rows like these ...

Though I solved the problem by writing a bash script, iterating over the files and writing to a temp file, it would be great to know what I did wrong, how to do it, and if there might be other ways to go about it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

随风而去 2024-09-18 11:57:11

您应该发现这可行:

find . -name "*.gz" | while read -r file; do zcat -f "$file" | head -n 1; done

You should find that this will work:

find . -name "*.gz" | while read -r file; do zcat -f "$file" | head -n 1; done
弥枳 2024-09-18 11:57:11

它按照你的要求工作了。

head 完成了它的工作,打印了一行,然后退出。然后,在 xargs 的支持下运行的 zcat 尝试写入关闭的管道,并收到了致命的 SIGPIPE。当它的孩子死掉时,xargs 报告了原因。

要获得所需的行为,您需要find -exec ... 构造或自定义zhead 来提供给 xargs。

添加了我在冰箱后面发现的垃圾代码

#!/usr/bin/python

"""zhead - poor man's zcat file... | head -n
   no argument error checking, prefers to continue in the face of
   IO errors, with diagnostic to stderr

   sample usage: find ... | xargs zhead.py -1"""

import gzip
import sys

if sys.argv[1].startswith('-'):
    nlines = int(sys.argv[1][1:])
    start = 2
else:
    nlines = 10
    start = 1

for zfile in sys.argv[start:]:
    try:
        zin = gzip.open(zfile)
        for i in range(nlines):
            line = zin.readline()
            if not line:
                break
            print line,
    except Exception as err:
        print >> sys.stderr, zfile, err
    finally:
        try:
            zin.close()
        except:
            pass

它在大约一分钟内处理了 /usr/share/man 中的 10k 个文件。

It worked as you asked it to.

head did its job, printed one line, and exited. zcat then running under the auspices of xargs tried to write to a closed pipe and received a fatal SIGPIPE for its efforts. Having its child die, xargs reported the whyfor.

To get the desired behaviour, you'd need to find -exec ... construction or a custom zhead to give to xargs.

added junk code I found behind the fridge:

#!/usr/bin/python

"""zhead - poor man's zcat file... | head -n
   no argument error checking, prefers to continue in the face of
   IO errors, with diagnostic to stderr

   sample usage: find ... | xargs zhead.py -1"""

import gzip
import sys

if sys.argv[1].startswith('-'):
    nlines = int(sys.argv[1][1:])
    start = 2
else:
    nlines = 10
    start = 1

for zfile in sys.argv[start:]:
    try:
        zin = gzip.open(zfile)
        for i in range(nlines):
            line = zin.readline()
            if not line:
                break
            print line,
    except Exception as err:
        print >> sys.stderr, zfile, err
    finally:
        try:
            zin.close()
        except:
            pass

It processed 10k files in /usr/share/man in about a minute.

一花一树开 2024-09-18 11:57:11

如果您有 GNU Parallel http://www.gnu.org/software/parallel/安装:

find . -name '*.gz' | parallel 'zcat {} | head -n1'

观看 GNU Parallel 的介绍视频,网址为 http://www.youtube.com/watch? v=OpaiGYxkSuQ

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed:

find . -name '*.gz' | parallel 'zcat {} | head -n1'

Watch the intro video to GNU Parallel at http://www.youtube.com/watch?v=OpaiGYxkSuQ

浪漫人生路 2024-09-18 11:57:11
zcat -r * 2>/dev/null | awk -vRS= -vFS="\n" '{print $1}'
zcat -r * 2>/dev/null | awk -vRS= -vFS="\n" '{print $1}'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文