循环遍历文件扩展名，查找非 ASCII 字符 - Python

发布于 2024-12-15 08:23:15 字数 935 浏览 2 评论 0原文

我编写了一个小 Python 程序，它在目录（及其子目录）中查找包含非 ASCII 字符的文件。

我想改进它。我知道这个“目录”中的某些文件可能是 ZIP、DTA/OUT、OMX、SFD/SF3 等...假定具有非 ASCII 字符的文件。所以我想知道这些文件是否存在并筛选那些不应包含 ASCII 字符的文件，因为我的最终目标是找到不应包含非 ASCII 字符的文件并将其删除（带有坏扇区的损坏磁盘，价值 TB）重要数据）。

我的想法是进一步查看 Python 中 try/ except 块的“例外”部分中的文件，如下所示：

try:
    content.encode('ascii')
    output.write(str(counter) + ", " + file + ", ASCII\n")
    print str(counter) + " ASCII file status logged successfully: " + file
    counter += 1 

except UnicodeDecodeError:
    output.write(str(counter) + ", " + file + ", non-ASCII\n")
    print str(counter) + " non-ASCII file status logged successfully: " + file
    counter += 1

当我开始编写代码时，我意识到循环询问文件是否为 '.zip' 或 '.sfd' pr '.omx' 等...将是一个笨重的程序并永远存在。

除了逐个搜索一组文件扩展名之外，还有其他方法吗？也许包含这些扩展名的文件可供检查？还是有什么我没想到的？如果这是一个愚蠢的问题，我提前表示歉意，但是 Python 中有很多很酷的函数，我确信我错过了一些可以提供帮助的东西。

干杯。

原文

I wrote a little Python program that looks though a directory (and its subdirectories) for files that contain non-ASCII characters.

I want to improve it. I know that certain files in this "directory" may be ZIP, DTA/OUT, OMX, SFD/SF3, etc... files that ARE SUPPOSED to have non-ASCII characters. So I want to know these are there and screen the ones that shouldn't contain ASCII characters, because my ultimate goal is to find files that should not contain non-ASCII characters that do and remove them (corrupt disk with bad sectors with TB worth of important data).

My thinking is to further look through the files that are in the "except" portion of a try/except block in Python that looks like this:

try:
    content.encode('ascii')
    output.write(str(counter) + ", " + file + ", ASCII\n")
    print str(counter) + " ASCII file status logged successfully: " + file
    counter += 1 

except UnicodeDecodeError:
    output.write(str(counter) + ", " + file + ", non-ASCII\n")
    print str(counter) + " non-ASCII file status logged successfully: " + file
    counter += 1

When I started to write the code, I realized that looping through asking if the file is '.zip' or '.sfd' pr '.omx', etc... would be a clunky program and take for ever.

Is there any way to search a group of file extensions other than one by one? Maybe a file containing these extensions to check against? Or something I haven't thought of? My apologies in advance if this is a stupid question, but there are so many cool functions in Python that I'm sure I'm missing something that can help.

Cheers.

分享到QQ

分享到微博