查找 bash 脚本中仅包含可打印字符的文件

发布于 2024-09-24 10:36:04 字数 656 浏览 1 评论 0原文

我正在尝试编写一个 bash 脚本，该脚本查看充满文件的目录并将它们分类为纯文本或二进制。如果文件仅包含纯文本字符，则该文件是纯文本文件，否则它是二进制文件。到目前为止，我已经尝试了 grep 的以下排列：

#!/bin/bash
FILES=`ls`
for i in $FILES
do
    ########GREP SYNTAX###########
    if grep -qv -e[:cntrl:] $i
    ########/GREP SYNTAX##########
    then
        mv $i $i-plaintext.txt
    else
        mv $i $i-binary.txt
    fi
done

在 grep 语法行中，我也尝试了不带 -v 标志的相同操作并交换 if 语句的分支，以及相同与 [:alnum: 的组合： ] 和 [:打印:]。所有这六种变体都会产生一些标记为二进制的文件，其中仅包含纯文本，以及一些标记为明文的文件，其中至少包含一个不可打印的字符。

我需要找到一种方法来识别仅包含可打印字符的文件，即 AZ、az、0-9、标点符号、空格和换行符。包含不在此集合中的任何字符的所有文件都应归类为二进制。

我已经把头撞在墙上试图解决这个问题半天了。帮助！提前致谢，里克

原文

I'm trying to write a bash script that looks at a directory full of files and categorises them as either plaintext or binary. A file is plaintext if it ONLY contains plaintext characters, otherwise it is binary. So far I have tried the following permutations of grep:

#!/bin/bash
FILES=`ls`
for i in $FILES
do
    ########GREP SYNTAX###########
    if grep -qv -e[:cntrl:] $i
    ########/GREP SYNTAX##########
    then
        mv $i $i-plaintext.txt
    else
        mv $i $i-binary.txt
    fi
done

In the grep syntax line, I have also tried the same without the -v flag and swapping the branches of the if statements, as well as both combinations of the same with [:alnum:] and [:print:]. All six of these variations produce some files labelled binary wich consist solely of plantext and some files labelled plaintext which contain at least one non-printable character.

I need to find a way to identify files that only contain printable characters i.e. A-Z, a-z, 0-9, punctuation, spaces and new lines. All files containing any character that is not in this set shoudl be classified as binary.

I've been bashing my head against a wall trying to sort this for half a day.
Help!
Thanks in advance,
Rik

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱给你人给你 2024-10-01 10:36:05

您可以使用 grep 的 -I 选项，它将二进制文件视为不匹配的文件，并仅使用始终匹配的正则表达式（如空字符串）：

if grep -qI -e '' $i

You can use the -I option of grep which will treat binary files as files without a match and just use a regex that will always match (like the empty string):

if grep -qI -e '' $i

回复收藏 0 原文

灵芸 2024-10-01 10:36:04

首先，您可以/应该这样做，

for f in *

而不是将 ls 的输出放入变量中。这样做的主要原因是能够处理包含空格的文件名。

其次，您需要将字符类括在一组括号中，否则它将把这些字符视为文字。我会将它们括在一组单引号中，以防止 shell 解释它们。不要使用 -v 并否定 print 类，看看这是否适合您。

if grep -aq -e '[^[:print:]]' "$f"

如该行所示，当变量包含文件名时，请始终引用变量。

mv "$f" "$f-plaintext.txt"

要防止 grep 抱怨二进制文件，请使用 -a。

变量i通常用于整数或索引。使用f 或file。

最后：

#!/bin/bash
for f in *
do
    if grep -aq -e '[^[:print:]]' "$f"
    then
        mv "$f" "$f-binary.txt"
    else
        mv "$f" "$f-plaintext.txt"
    fi
done

First you can/should do

for f in *

instead of putting the output of ls in a variable. The chief reason for doing this is to be able to handle filenames that include spaces.

Second, you need to enclose the character class in a set of brackets or it's going to look at those characters as literals. And I would enclose them in a set of single quotes to protect against the shell interpreting them. Don't use -v and negate the print class and see if that works for you.

if grep -aq -e '[^[:print:]]' "$f"

And as shown in that line, always quote variables when they contain filenames.

mv "$f" "$f-plaintext.txt"

To keep grep from complaining about binary files, use -a.

The variable i is often used for an integer or an index. Use f or file.

Finally:

#!/bin/bash
for f in *
do
    if grep -aq -e '[^[:print:]]' "$f"
    then
        mv "$f" "$f-binary.txt"
    else
        mv "$f" "$f-plaintext.txt"
    fi
done

回复收藏 0 原文

~没有更多了~