cut、colrm、awk 和 sed 的奇怪问题:无法从管道流中剪切字符
我创建了一个脚本来枚举目录及其下面的所有文件。我想通过使用 pv 添加一些进度反馈,因为我通常从根目录使用它。
问题是 find 的时间输出(%TT)中总是包含小数秒,但我不想记录这么多细节。
如果我编写脚本一次性完成所有事情,我就会得到正确的输出。但是,如果我使用中间文件在“第二次”传递期间进行估计,结果会发生变化,我不明白为什么。
这个版本给出了正确的结果:
#!/bin/bash
find -printf "%11s %TY-%Tm-%Td %TT %p\n" 2> /dev/null |
# - Remove the fractional seconds from the time
# before: 4096 2011-01-19 22:43:51.0000000000 .
# after : 4096 2011-01-19 22:43:51 .
colrm 32 42 |
pv -ltrbN "Enumerating files..." |
# - Sort every thing by filename
sort -k 4
但是排序可能需要很长时间,所以我尝试了这样的方法,以获得更多反馈:
#!/bin/bash
TMPFILE1=$(mktemp)
TMPFILE2=$(mktemp)
# Erase temporary files before quitting
trap "rm $TMPFILE1 $TMPFILE2" EXIT
find -printf "%11s %TY-%Tm-%Td %TT %p\n" 2> /dev/null |
pv -ltrbN "Enumerating files..." > $TMPFILE1
LINE_COUNT="$(wc -l $TMPFILE1)"
#cat $TMPFILE1 | colrm 32 42 | #1
#cat $TMPFILE1 | cut -c1-31,43- | #2
#cut -c1-31,43- $TMPFILE1 | #3
#sed s/.0000000000// $TMPFILE1 | #4
awk -F".0000000000" '{print $1 $2}' $TMPFILE1 | #5
pv -lN "Removing fractional seconds..." -s $LINE_COUNT > $TMPFILE2
echo "Sorting list by filenames..." >&2
cat $TMPFILE2 |
sort -k 4
这 5 个“解决方案”都不起作用。 “.0000000000”部分保留在输出中。
有人可以解释为什么吗?
我的最终解决方案是将剪切操作与查找结合起来,仅使用一个临时文件。仅排序是单独完成的。
I have created a script to enumerate all files in a directory and below it. I wanted to add some progression feed-back by using pv, because I usually use it from the root directory.
The problem is find which always include fractional seconds in its time output (%TT), but I don't want to record so much detail.
If I write the script to do every thing in one pass, I get the right output. But if I use intermediate files to have an estimation during a "second" pass, the result change and I do not see why.
This version give the right result:
#!/bin/bash
find -printf "%11s %TY-%Tm-%Td %TT %p\n" 2> /dev/null |
# - Remove the fractional seconds from the time
# before: 4096 2011-01-19 22:43:51.0000000000 .
# after : 4096 2011-01-19 22:43:51 .
colrm 32 42 |
pv -ltrbN "Enumerating files..." |
# - Sort every thing by filename
sort -k 4
But the sort can take a long time, so I tried something like this, to have a little more feed-back:
#!/bin/bash
TMPFILE1=$(mktemp)
TMPFILE2=$(mktemp)
# Erase temporary files before quitting
trap "rm $TMPFILE1 $TMPFILE2" EXIT
find -printf "%11s %TY-%Tm-%Td %TT %p\n" 2> /dev/null |
pv -ltrbN "Enumerating files..." > $TMPFILE1
LINE_COUNT="$(wc -l $TMPFILE1)"
#cat $TMPFILE1 | colrm 32 42 | #1
#cat $TMPFILE1 | cut -c1-31,43- | #2
#cut -c1-31,43- $TMPFILE1 | #3
#sed s/.0000000000// $TMPFILE1 | #4
awk -F".0000000000" '{print $1 $2}' $TMPFILE1 | #5
pv -lN "Removing fractional seconds..." -s $LINE_COUNT > $TMPFILE2
echo "Sorting list by filenames..." >&2
cat $TMPFILE2 |
sort -k 4
None of the 5 "solutions" works. The ".0000000000" part is left in the output.
Can someone explain why?
My final solution is to combine the cutting operation with the find and use only one temporary file. Only the sort is done separately.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用字段精度说明符(至少使用 GNU
find
4.4.2)截断-printf
参数中的秒:这会在“HH:”中留下八个字符MM:SS”。
我的答案的其余部分可能没有实际意义:
您的 #1-5 不起作用的原因是
wc
的输出包含文件名(尤其是空格)。该空格使pv
将wc
命令中的文件名视为输入文件。命令行参数的优先级高于标准输入。由于它恰好与通过管道传递的输入文件相同,因此输出文件看起来像未处理的输入文件(因为它是,因为管道被忽略)。仅捕获计数而不捕获文件名:
以下是一些小的改进:
或
或
You can truncate the seconds within the argument to
-printf
using a field precision specifier (at least using GNUfind
4.4.2):which leaves the eight characters in "HH:MM:SS".
The rest of my answer is possibly moot:
The reason your #1-5 don't work is that the output of
wc
includes the filename (and especially a space). The space causespv
to see the filename from thewc
command as an input file. The command line argument has higher precedence than stdin. Since it happens to be the same as the input file that's being passed through the pipe, the output file looks like an unprocessed input file (because it is, since the pipeline is ignored).To capture only the count without the filename:
Here are some minor improvements:
or
or
如果这是一个实际的工作工具,而不仅仅是一个玩具,那么我就会把“进度反馈”全部放弃……也许当它不会让你的生活变得复杂时再回来。与此同时,您可能花费更多时间尝试找出如何提供反馈,而不是等待脚本返回。
如果您绝对必须提供某种反馈,那么就
echo "Sorting
wc -l $TMPFILE
lines ..."根据经验,您会感觉到对这么多行进行排序需要多长时间。
吻它,我的儿子,吻它。
If this an actual working tool, and not just a toy, then I'd just drop the "progress feedback" all together... maybe comeback to it when it doesn't complicate your life. In the meantime you've probably spent more time trying to figure out how to give feedback than you will ever spent waiting for your script to return.
If you absolutely MUST give some sort of feedback then just
echo "Sorting
wc -l $TMPFILE
lines ..."You'll get a feeling for how long it'll take to sort so-many lines from experience.
Kiss it my son, kiss it.