查找 bash 脚本中仅包含可打印字符的文件

发布于 2024-09-24 10:36:04 字数 656 浏览 1 评论 0原文

我正在尝试编写一个 bash 脚本,该脚本查看充满文件的目录并将它们分类为纯文本或二进制。如果文件仅包含纯文本字符,则该文件是纯文本文件,否则它是二进制文件。到目前为止,我已经尝试了 grep 的以下排列:

#!/bin/bash
FILES=`ls`
for i in $FILES
do
    ########GREP SYNTAX###########
    if grep -qv -e[:cntrl:] $i
    ########/GREP SYNTAX##########
    then
        mv $i $i-plaintext.txt
    else
        mv $i $i-binary.txt
    fi
done

在 grep 语法行中,我也尝试了不带 -v 标志的相同操作并交换 if 语句的分支,以及相同与 [:alnum: 的组合: ] 和 [:打印:]。所有这六种变体都会产生一些标记为二进制的文件,其中仅包含纯文本,以及一些标记为明文的文件,其中至少包含一个不可打印的字符。

我需要找到一种方法来识别包含可打印字符的文件,即 AZ、az、0-9、标点符号、空格和换行符。包含不在此集合中的任何字符的所有文件都应归类为二进制。

我已经把头撞在墙上试图解决这个问题半天了。 帮助! 提前致谢, 里克

I'm trying to write a bash script that looks at a directory full of files and categorises them as either plaintext or binary. A file is plaintext if it ONLY contains plaintext characters, otherwise it is binary. So far I have tried the following permutations of grep:

#!/bin/bash
FILES=`ls`
for i in $FILES
do
    ########GREP SYNTAX###########
    if grep -qv -e[:cntrl:] $i
    ########/GREP SYNTAX##########
    then
        mv $i $i-plaintext.txt
    else
        mv $i $i-binary.txt
    fi
done

In the grep syntax line, I have also tried the same without the -v flag and swapping the branches of the if statements, as well as both combinations of the same with [:alnum:] and [:print:]. All six of these variations produce some files labelled binary wich consist solely of plantext and some files labelled plaintext which contain at least one non-printable character.

I need to find a way to identify files that only contain printable characters i.e. A-Z, a-z, 0-9, punctuation, spaces and new lines. All files containing any character that is not in this set shoudl be classified as binary.

I've been bashing my head against a wall trying to sort this for half a day.
Help!
Thanks in advance,
Rik

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

爱给你人给你 2024-10-01 10:36:05

您可以使用 grep 的 -I 选项,它将二进制文件视为不匹配的文件,并仅使用始终匹配的正则表达式(如空字符串):

if grep -qI -e '' $i

You can use the -I option of grep which will treat binary files as files without a match and just use a regex that will always match (like the empty string):

if grep -qI -e '' $i
灵芸 2024-10-01 10:36:04

首先,您可以/应该这样做,

for f in *

而不是将 ls 的输出放入变量中。这样做的主要原因是能够处理包含空格的文件名。

其次,您需要将字符类括在一组括号中,否则它将把这些字符视为文字。我会将它们括在一组单引号中,以防止 shell 解释它们。不要使用 -v 并否定 print 类,看看这是否适合您。

if grep -aq -e '[^[:print:]]' "$f"

如该行所示,当变量包含文件名时,请始终引用变量。

mv "$f" "$f-plaintext.txt"

要防止 grep 抱怨二进制文件,请使用 -a

变量i通常用于整数或索引。使用ffile

最后:

#!/bin/bash
for f in *
do
    if grep -aq -e '[^[:print:]]' "$f"
    then
        mv "$f" "$f-binary.txt"
    else
        mv "$f" "$f-plaintext.txt"
    fi
done

First you can/should do

for f in *

instead of putting the output of ls in a variable. The chief reason for doing this is to be able to handle filenames that include spaces.

Second, you need to enclose the character class in a set of brackets or it's going to look at those characters as literals. And I would enclose them in a set of single quotes to protect against the shell interpreting them. Don't use -v and negate the print class and see if that works for you.

if grep -aq -e '[^[:print:]]' "$f"

And as shown in that line, always quote variables when they contain filenames.

mv "$f" "$f-plaintext.txt"

To keep grep from complaining about binary files, use -a.

The variable i is often used for an integer or an index. Use f or file.

Finally:

#!/bin/bash
for f in *
do
    if grep -aq -e '[^[:print:]]' "$f"
    then
        mv "$f" "$f-binary.txt"
    else
        mv "$f" "$f-plaintext.txt"
    fi
done
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文