查找 bash 脚本中仅包含可打印字符的文件
我正在尝试编写一个 bash 脚本,该脚本查看充满文件的目录并将它们分类为纯文本或二进制。如果文件仅包含纯文本字符,则该文件是纯文本文件,否则它是二进制文件。到目前为止,我已经尝试了 grep 的以下排列:
#!/bin/bash
FILES=`ls`
for i in $FILES
do
########GREP SYNTAX###########
if grep -qv -e[:cntrl:] $i
########/GREP SYNTAX##########
then
mv $i $i-plaintext.txt
else
mv $i $i-binary.txt
fi
done
在 grep 语法行中,我也尝试了不带 -v 标志的相同操作并交换 if 语句的分支,以及相同与 [:alnum: 的组合: ] 和 [:打印:]。所有这六种变体都会产生一些标记为二进制的文件,其中仅包含纯文本,以及一些标记为明文的文件,其中至少包含一个不可打印的字符。
我需要找到一种方法来识别仅包含可打印字符的文件,即 AZ、az、0-9、标点符号、空格和换行符。包含不在此集合中的任何字符的所有文件都应归类为二进制。
我已经把头撞在墙上试图解决这个问题半天了。 帮助! 提前致谢, 里克
I'm trying to write a bash script that looks at a directory full of files and categorises them as either plaintext or binary. A file is plaintext if it ONLY contains plaintext characters, otherwise it is binary. So far I have tried the following permutations of grep:
#!/bin/bash
FILES=`ls`
for i in $FILES
do
########GREP SYNTAX###########
if grep -qv -e[:cntrl:] $i
########/GREP SYNTAX##########
then
mv $i $i-plaintext.txt
else
mv $i $i-binary.txt
fi
done
In the grep syntax line, I have also tried the same without the -v flag and swapping the branches of the if statements, as well as both combinations of the same with [:alnum:] and [:print:]. All six of these variations produce some files labelled binary wich consist solely of plantext and some files labelled plaintext which contain at least one non-printable character.
I need to find a way to identify files that only contain printable characters i.e. A-Z, a-z, 0-9, punctuation, spaces and new lines. All files containing any character that is not in this set shoudl be classified as binary.
I've been bashing my head against a wall trying to sort this for half a day.
Help!
Thanks in advance,
Rik
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 grep 的 -I 选项,它将二进制文件视为不匹配的文件,并仅使用始终匹配的正则表达式(如空字符串):
You can use the -I option of grep which will treat binary files as files without a match and just use a regex that will always match (like the empty string):
首先,您可以/应该这样做,
而不是将 ls 的输出放入变量中。这样做的主要原因是能够处理包含空格的文件名。
其次,您需要将字符类括在一组括号中,否则它将把这些字符视为文字。我会将它们括在一组单引号中,以防止 shell 解释它们。不要使用
-v
并否定print
类,看看这是否适合您。如该行所示,当变量包含文件名时,请始终引用变量。
要防止
grep
抱怨二进制文件,请使用-a
。变量
i
通常用于整数或索引。使用f
或file
。最后:
First you can/should do
instead of putting the output of
ls
in a variable. The chief reason for doing this is to be able to handle filenames that include spaces.Second, you need to enclose the character class in a set of brackets or it's going to look at those characters as literals. And I would enclose them in a set of single quotes to protect against the shell interpreting them. Don't use
-v
and negate theprint
class and see if that works for you.And as shown in that line, always quote variables when they contain filenames.
To keep
grep
from complaining about binary files, use-a
.The variable
i
is often used for an integer or an index. Usef
orfile
.Finally: