检测 Windows 中文件名的大小写不匹配(最好使用 python)?
我有一些在 Windows 环境中创建但部署在 Linux 上的 xml 配置文件。这些配置文件通过文件路径相互引用。我们之前遇到过区分大小写和尾随空格的问题,我想编写一个脚本来检查这些问题。如果有帮助的话,我们有 Cygwin。
示例:
假设我有一个对文件 foo/bar/baz.xml 的引用,我会这样做
<someTag fileref="foo/bar/baz.xml" />
现在如果我们错误地这样做:
<someTag fileref="fOo/baR/baz.Xml " />
它仍然可以在 Windows 上工作,但在 Linux 上会失败。
我想要做的是检测这些文件中的文件引用在区分大小写方面与真实文件不匹配的情况。
I have some xml-configuration files that we create in a Windows environment but is deployed on Linux. These configuration files reference each other with filepaths. We've had problems with case-sensitivity and trailing spaces before, and I'd like to write a script that checks for these problems. We have Cygwin if that helps.
Example:
Let's say I have a reference to the file foo/bar/baz.xml, I'd do this
<someTag fileref="foo/bar/baz.xml" />
Now if we by mistake do this:
<someTag fileref="fOo/baR/baz.Xml " />
It will still work on Windows, but it will fail on Linux.
What I want to do is detect these cases where the file reference in these files don't match the real file with respect to case sensitivity.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
os.listdir 在目录上,在所有保留大小写的文件系统中(包括 Windows 上的文件),返回您列出的目录中文件名的实际大小写。
因此,您需要在路径的每个级别进行此检查:
我假设名称完全不存在任何大小写变体是一种不同类型的错误,并为此使用异常;并且,对于整个路径(假设没有驱动器号或 UNC 无论如何也不会转换为 Windows):
您可能需要调整此 if ,例如,
foo/bar
不被视为意味着foo
位于当前目录中,但位于其他位置;或者,当然,如果确实需要 UNC 或驱动器号(但正如我提到的,将它们转换为 Linux 无论如何都不是小事;-)。实现说明:我利用了
zip
只是删除超出其压缩序列的最短序列长度的“额外条目”;所以我不需要在第一个参数中显式地从levels
中切掉“叶子”(最后一个条目),zip
会为我做到这一点。all
会尽可能地短路,一旦检测到错误值就返回False
,因此它与显式循环一样好,但更快、更简洁。os.listdir on a directory, in all case-preserving filesystems (including those on Windows), returns the actual case for the filenames in the directory you're listing.
So you need to do this check at each level of the path:
where I'm assuming that the complete absence of any case variation of a name is a different kind of error, and using an exception for that; and, for the whole path (assuming no drive letters or UNC that wouldn't translate to Windows anyway):
You may need to adapt this if , e.g.,
foo/bar
is not to be taken to mean thatfoo
is in the current directory, but somewhere else; or, of course, if UNC or drive letters are in fact needed (but as I mentioned translating them to Linux is not trivial anyway;-).Implementation notes: I'm taking advantage of the fact that
zip
just drop "extra entries" beyond the length of the shortest of the sequences it's zipping; so I don't need to explicitly slice off the "leaf" (last entry) fromlevels
in the first argument,zip
does it for me.all
will short circuit where it can, returningFalse
as soon as it detects a false value, so it's just as good as an explicit loop but faster and more concise.很难判断你的问题到底是什么,但是如果你应用
os.path.normcase
以及str.stript
在保存文件名之前,它应该可以解决您的所有问题。正如我在评论中所说,目前尚不清楚你是如何以这样的错误告终的。但是,只要您有一些合理的约定(例如,所有文件名都是小写),检查现有文件就很简单:
it's hard to judge what exactly your problem is, but if you apply
os.path.normcase
along withstr.stript
before saving your file name, it should solve all your problems.as I said in comment, it's not clear how are you ending up with such a mistake. However, it would be trivial to check for existing file, as long as you have some sensible convention (all file names are lower case, for example):