循环遍历文件扩展名,查找非 ASCII 字符 - Python

发布于 2024-12-15 08:23:15 字数 935 浏览 2 评论 0原文

我编写了一个小 Python 程序,它在目录(及其子目录)中查找包含非 ASCII 字符的文件。

我想改进它。我知道这个“目录”中的某些文件可能是 ZIP、DTA/OUT、OMX、SFD/SF3 等...假定具有非 ASCII 字符的文件。所以我想知道这些文件是否存在并筛选那些不应包含 ASCII 字符的文件,因为我的最终目标是找到不应包含非 ASCII 字符的文件并将其删除(带有坏扇区的损坏磁盘,价值 TB)重要数据)。

我的想法是进一步查看 Python 中 try/ except 块的“例外”部分中的文件,如下所示:

try:
    content.encode('ascii')
    output.write(str(counter) + ", " + file + ", ASCII\n")
    print str(counter) + " ASCII file status logged successfully: " + file
    counter += 1 

except UnicodeDecodeError:
    output.write(str(counter) + ", " + file + ", non-ASCII\n")
    print str(counter) + " non-ASCII file status logged successfully: " + file
    counter += 1 

当我开始编写代码时,我意识到循环询问文件是否为 '.zip''.sfd' pr '.omx' 等...将是一个笨重的程序并永远存在。

除了逐个搜索一组文件扩展名之外,还有其他方法吗?也许包含这些扩展名的文件可供检查?还是有什么我没想到的?如果这是一个愚蠢的问题,我提前表示歉意,但是 Python 中有很多很酷的函数,我确信我错过了一些可以提供帮助的东西。

干杯。

I wrote a little Python program that looks though a directory (and its subdirectories) for files that contain non-ASCII characters.

I want to improve it. I know that certain files in this "directory" may be ZIP, DTA/OUT, OMX, SFD/SF3, etc... files that ARE SUPPOSED to have non-ASCII characters. So I want to know these are there and screen the ones that shouldn't contain ASCII characters, because my ultimate goal is to find files that should not contain non-ASCII characters that do and remove them (corrupt disk with bad sectors with TB worth of important data).

My thinking is to further look through the files that are in the "except" portion of a try/except block in Python that looks like this:

try:
    content.encode('ascii')
    output.write(str(counter) + ", " + file + ", ASCII\n")
    print str(counter) + " ASCII file status logged successfully: " + file
    counter += 1 

except UnicodeDecodeError:
    output.write(str(counter) + ", " + file + ", non-ASCII\n")
    print str(counter) + " non-ASCII file status logged successfully: " + file
    counter += 1 

When I started to write the code, I realized that looping through asking if the file is '.zip' or '.sfd' pr '.omx', etc... would be a clunky program and take for ever.

Is there any way to search a group of file extensions other than one by one? Maybe a file containing these extensions to check against? Or something I haven't thought of? My apologies in advance if this is a stupid question, but there are so many cool functions in Python that I'm sure I'm missing something that can help.

Cheers.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

吻安 2024-12-22 08:23:15

我想既然没有任何答案,我可以自己用部分答案来回答这个问题。我基本上采取了不同的方法,寻找一个预计对于该共享来说是丰富的特定文件,然后对每个文件执行相同的操作。这有点笨拙,但它会完成工作。

I figure since there aren't any answers I can go ahead and answer this myself with a partial answer. I basically took a different approach and looked for a particular file that is expected to be abundant for this share and then will do the same for each file. It's kind of hacky, but it will get the j ob done.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文