对于平台无关的应用程序,使 pathlib.glob() 和 pathlib.rglob() 不区分大小写
我正在使用 pathlib.glob()
和 pathlib.rglob()
到分别匹配目录及其子目录中的文件。目标文件都是小写的
.txt
和大写的.TXT
文件。从文件系统读取的文件路径如下:
import pathlib
directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']
for filename in files_to_create:
filepath = directory / filename
filepath.touch()
for suffix in suffixes_to_test:
files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
print(f'{suffix}: {files}')
大部分代码库是在 Windows 10 计算机(运行 Python 3.7.4)上开发的,现在已转移到 macOS Monterey 12.0.1(运行 Python 3.10.1)。
在 Windows 上,a.txt
和 b.TXT
这两个文件都与模式匹配:
*.txt: [WindowsPath('a.txt'), WindowsPath('b.TXT')]
*.TXT: [WindowsPath('a.txt'), WindowsPath('b.TXT')]
相反,macOS 只有一个文件与每个模式匹配:
*.txt: [PosixPath('a.txt')]
*.TXT: [PosixPath('b.TXT')]
因此,我假设 macOS 文件系统可能区分大小写,而 Windows 则不区分大小写。根据 Apple 的用户指南 macOS 文件系统默认情况下,used 不应该区分大小写,但可以这样配置。 类似的情况可能适用于 Linux 或 Unix 文件系统,如此处和此处。
尽管存在这种不同行为的原因,但我需要找到一种与平台无关的方法来获取大写 TXT
和小写 txt
文件。 一个相当幼稚的解决方法可能是这样的:
results = set([fp.relative_to(directory) for suffix in suffixes_to_test for fp in directory.glob(suffix)])
这在 macOS 和 Windows 上都提供了所需的输出:
{PosixPath('b.TXT'), PosixPath('a.txt')}
但是,有更优雅的方法吗?我在 pathlib 中找不到任何类似 ignore_case
的选项文档。
I am using pathlib.glob()
and pathlib.rglob()
to matching files from a directory and its subdirectories, respectively. Target files both are both lower case .txt
and upper case .TXT
files. According file paths were read from the filesystem as follows:
import pathlib
directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']
for filename in files_to_create:
filepath = directory / filename
filepath.touch()
for suffix in suffixes_to_test:
files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
print(f'{suffix}: {files}')
The majority of the code base was developed on a Windows 10 machine (running Python 3.7.4) and was now moved to macOS Monterey 12.0.1 (running Python 3.10.1).
On Windows both files a.txt
and b.TXT
are matching the patterns:
*.txt: [WindowsPath('a.txt'), WindowsPath('b.TXT')]
*.TXT: [WindowsPath('a.txt'), WindowsPath('b.TXT')]
In contrast, macOS only one file matches each pattern:
*.txt: [PosixPath('a.txt')]
*.TXT: [PosixPath('b.TXT')]
Therefore, I assume that the macOS file system might be case-sensitive, whereas the Windows one is not. According to Apple's User Guide the macOS file system used should not be case-sensitive by default but can be configured as such.
Something similar might apply for Linux or Unix file systems as discussed here and here.
Despite the reason for this differing behavior, I need to find a platform-agnostic way to get both capital TXT
and lower case txt
files.
A rather naive workaround could be something like this:
results = set([fp.relative_to(directory) for suffix in suffixes_to_test for fp in directory.glob(suffix)])
Which gives the desired output on both macOS and Windows:
{PosixPath('b.TXT'), PosixPath('a.txt')}
However, is there a more elegant way? I could not find any option like ignore_case
in pathlib's documentation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
怎么样:
它对于“不区分大小写的 glob”来说并不是那么通用,但它对于有限和特定的用例(例如特定扩展的 glob)来说效果很好。
What about something like:
It is not so generalizable for a "case-insensitive glob", but it works well for limited and specific use-case like your glob of a specific extension.
您还可以使用语法:
这将匹配 txt 的所有大小写组合。
You could also use the syntax:
This would match all case combinations of txt.
对于有人只想查找任意文件后缀的不区分大小写的匹配的情况,以unicode安全的方式,可以直接测试后缀:
For the case where someone just wants to find case insensitive matches of arbitrary file suffix, in a unicode safe way, the suffix can be tested directly:
如果您不希望区分大小写,则可以将大小写设置为小写或大写。
这将为您所需的输出示例提供不同的输出,但也可能是您所要求的?
您也可以从这里的答案中汲取灵感:在 Linux 上忽略 glob() 中的大小写
如果您确实想要区分大小写,那么您可以调整这些答案并使用
fnmatch.fnmatchcase
方法使Windows行为区分大小写。未经尝试,但可能类似;
If you are not wanting case sensitive then you could set the case to lower or upper.
This will give different output to your example of desired output but also might be what you are asking?
You might also draw inspiration from answers here: Ignore case in glob() on Linux
If you do want to be case sensitive then you might adapt those answers and use the
fnmatch.fnmatchcase
method to make the Windows behaviour case sensitive.Un-tried but probably something like;