对于平台无关的应用程序，使 pathlib.glob() 和 pathlib.rglob() 不区分大小写

发布于 2025-01-14 10:30:43 字数 2032 浏览 3 评论 0原文

我正在使用 pathlib.glob() 和 pathlib.rglob()到分别匹配目录及其子目录中的文件。目标文件都是小写的.txt和大写的.TXT文件。从文件系统读取的文件路径如下：

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']

for filename in files_to_create:
    filepath = directory / filename
    filepath.touch()
    
for suffix in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
    print(f'{suffix}: {files}')

大部分代码库是在 Windows 10 计算机（运行 Python 3.7.4）上开发的，现在已转移到 macOS Monterey 12.0.1（运行 Python 3.10.1）。

在 Windows 上，a.txt 和 b.TXT 这两个文件都与模式匹配：

*.txt: [WindowsPath('a.txt'), WindowsPath('b.TXT')]
*.TXT: [WindowsPath('a.txt'), WindowsPath('b.TXT')]

相反，macOS 只有一个文件与每个模式匹配：

*.txt: [PosixPath('a.txt')]
*.TXT: [PosixPath('b.TXT')]

因此，我假设 macOS 文件系统可能区分大小写，而 Windows 则不区分大小写。根据 Apple 的用户指南 macOS 文件系统默认情况下，used 不应该区分大小写，但可以这样配置。类似的情况可能适用于 Linux 或 Unix 文件系统，如此处和此处。

尽管存在这种不同行为的原因，但我需要找到一种与平台无关的方法来获取大写 TXT 和小写 txt 文件。一个相当幼稚的解决方法可能是这样的：

results = set([fp.relative_to(directory) for suffix in suffixes_to_test for fp in directory.glob(suffix)])

这在 macOS 和 Windows 上都提供了所需的输出：

{PosixPath('b.TXT'), PosixPath('a.txt')}

但是，有更优雅的方法吗？我在 pathlib 中找不到任何类似 ignore_case 的选项文档。

原文

I am using pathlib.glob() and pathlib.rglob() to matching files from a directory and its subdirectories, respectively. Target files both are both lower case .txt and upper case .TXT files. According file paths were read from the filesystem as follows:

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']

for filename in files_to_create:
    filepath = directory / filename
    filepath.touch()
    
for suffix in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
    print(f'{suffix}: {files}')

The majority of the code base was developed on a Windows 10 machine (running Python 3.7.4) and was now moved to macOS Monterey 12.0.1 (running Python 3.10.1).

On Windows both files a.txt and b.TXT are matching the patterns:

*.txt: [WindowsPath('a.txt'), WindowsPath('b.TXT')]
*.TXT: [WindowsPath('a.txt'), WindowsPath('b.TXT')]

In contrast, macOS only one file matches each pattern:

*.txt: [PosixPath('a.txt')]
*.TXT: [PosixPath('b.TXT')]

Therefore, I assume that the macOS file system might be case-sensitive, whereas the Windows one is not. According to Apple's User Guide the macOS file system used should not be case-sensitive by default but can be configured as such.
Something similar might apply for Linux or Unix file systems as discussed here and here.

Despite the reason for this differing behavior, I need to find a platform-agnostic way to get both capital TXT and lower case txt files.
A rather naive workaround could be something like this:

results = set([fp.relative_to(directory) for suffix in suffixes_to_test for fp in directory.glob(suffix)])

Which gives the desired output on both macOS and Windows:

{PosixPath('b.TXT'), PosixPath('a.txt')}

However, is there a more elegant way? I could not find any option like ignore_case in pathlib's documentation.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

木森分化 2025-01-21 10:30:43

怎么样：

suffix = '*.[tT][xX][tT]'
files = [fp.relative_to(directory) for fp in directory.glob(suffix)]

它对于“不区分大小写的 glob”来说并不是那么通用，但它对于有限和特定的用例（例如特定扩展的 glob）来说效果很好。

What about something like:

suffix = '*.[tT][xX][tT]'
files = [fp.relative_to(directory) for fp in directory.glob(suffix)]

It is not so generalizable for a "case-insensitive glob", but it works well for limited and specific use-case like your glob of a specific extension.

回复收藏 0 原文

最偏执的依靠 2025-01-21 10:30:43

您还可以使用语法：

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']

for filename.lower() in files_to_create:
    filepath = directory / filename
    filepath.touch()

files = [fp.relative_to(directory) for fp in directory.glob("*.[tT][xX][tT]")]

这将匹配 txt 的所有大小写组合。

You could also use the syntax:

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']

for filename.lower() in files_to_create:
    filepath = directory / filename
    filepath.touch()

files = [fp.relative_to(directory) for fp in directory.glob("*.[tT][xX][tT]")]

This would match all case combinations of txt.

回复收藏 0 原文

七堇年 2025-01-21 10:30:43

对于有人只想查找任意文件后缀的不区分大小写的匹配的情况，以unicode安全的方式，可以直接测试后缀：

from pathlib import Path

search_dir = Path("your/directory")
extensions = [".png", ".jpg"]

filepaths = []
for ext in extensions:
    ext = ext.lower()
    filepaths.extend([
        fp.resolve()  # make sure it's absolute
        for fp in search_dir.glob('*')  # list all
        if fp.is_file() and fp.suffix.lower() == ext
    ])

For the case where someone just wants to find case insensitive matches of arbitrary file suffix, in a unicode safe way, the suffix can be tested directly:

from pathlib import Path

search_dir = Path("your/directory")
extensions = [".png", ".jpg"]

filepaths = []
for ext in extensions:
    ext = ext.lower()
    filepaths.extend([
        fp.resolve()  # make sure it's absolute
        for fp in search_dir.glob('*')  # list all
        if fp.is_file() and fp.suffix.lower() == ext
    ])

回复收藏 0 原文

娜些时光，永不杰束 2025-01-21 10:30:43

如果您不希望区分大小写，则可以将大小写设置为小写或大写。

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt']

for filename.lower() in files_to_create:
    filepath = directory / filename
    filepath.touch()

for suffix.lower() in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
    print(f'{suffix}: {files}')

这将为您所需的输出示例提供不同的输出，但也可能是您所要求的？

您也可以从这里的答案中汲取灵感：在 Linux 上忽略 glob() 中的大小写

如果您确实想要区分大小写，那么您可以调整这些答案并使用fnmatch.fnmatchcase方法使Windows行为区分大小写。

未经尝试，但可能类似；

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']

for filename in files_to_create:
    filepath = directory / filename
    filepath.touch()
    
for suffix in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob('*') if fnmatch.fnmatchcase(f'{fp}', suffix)]
    print(f'{suffix}: {files}')

If you are not wanting case sensitive then you could set the case to lower or upper.

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt']

for filename.lower() in files_to_create:
    filepath = directory / filename
    filepath.touch()

for suffix.lower() in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
    print(f'{suffix}: {files}')

This will give different output to your example of desired output but also might be what you are asking?

You might also draw inspiration from answers here: Ignore case in glob() on Linux

If you do want to be case sensitive then you might adapt those answers and use the fnmatch.fnmatchcase method to make the Windows behaviour case sensitive.

Un-tried but probably something like;

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']

for filename in files_to_create:
    filepath = directory / filename
    filepath.touch()
    
for suffix in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob('*') if fnmatch.fnmatchcase(f'{fp}', suffix)]
    print(f'{suffix}: {files}')

回复收藏 0 原文

~没有更多了~