我的 makeValidFilename 函数是否存在潜在错误?
它的灵感来自“如何从任意字符串?”,我编写了一个函数,它将接受任意字符串并使其成为有效的文件名。
从技术上讲,我的函数应该是这个问题的答案,但在将其发布为答案之前,我想确保我没有做任何愚蠢的事情或忽略任何事情。
我将其作为 tvnamer 的一部分编写 - 一个实用程序,它获取电视剧集文件名,并一致地对其进行重命名,其中一集取自 http://www.thetvdb.com - 而源文件名必须是有效的文件中,系列名称和剧集名称均已更正 - 因此理论上两者都可以包含任何字符。我不太关心安全性,而是可用性 - 它主要是为了防止文件被重命名 .some.series - [01x01].avi
和文件“消失”(而不是阻止邪恶的人)
它做了一些假设:
- 文件系统支持 Unicode 文件名。 HFS+和NTFS都可以,覆盖大部分用户。还有一个
normalize_unicode
参数用于去除 Unicode 字符(在 tvnamer 中,这是通过配置 XML 文件设置的) - 平台是 Darwin、Linux,其他所有内容都被视为 Windows
- 文件名旨在可见(不是像
.bashrc
这样的点文件) - 如果需要的话,修改代码以允许.abc
格式的文件名会很
简单) 处理:
- 如果文件名以
.
开头,则添加下划线(防止文件名.
..
和文件消失) - 删除目录分隔符:
/ Linux 上为
,OS X 上为/
和:
- 删除无效的 Windows 文件名字符
\/:*?"<>|
(在 Windows 上时,或使用windows_safe=True
强制) - 在保留文件名前添加下划线(
COM2
变为_COM2
、NUL
code> 变为_NUL
等) - Unicode 数据的可选标准化,因此
å
变为a
并删除不可转换的字符 - 截断超过 255 个字符的文件名在 Linux/Darwin 上,在 Windows 上为 32 个字符
可以在 找到并摆弄代码和一堆测试用例http://gist.github.com/256270。 “生产”代码可以在在 tvnamer/utils 中找到.py
这个函数有什么错误吗?我错过了什么条件吗?
It is inspired by "How to make a valid Windows filename from an arbitrary string?", I've written a function that will take arbitrary string and make it a valid filename.
My function should technically be an answer to this question, but I want to make sure I've not done anything stupid, or overlooked anything, before posting it as an answer.
I wrote this as part of tvnamer - a utility which takes TV episode filenames, and renames them nice and consistently, with an episode pulled from http://www.thetvdb.com - while the source filename must be a valid file, the series name is corrected, and the episode name - so both could contain theoretically any characters. I'm not so much concerned about security as usability - it's mainly to prevent files being renamed .some.series - [01x01].avi
and the file "disappearing" (rather than to thwart evil people)
It makes a few assumptions:
- The filesystem supports Unicode filenames. HFS+ and NTFS both do, which will cover a majority of users. There is also a
normalize_unicode
argument to strip out Unicode characters (in tvnamer, this is set via the config XML file) - The platform is either Darwin, Linux, and everything else is treated as Windows
- The filename is intended to be visible (not a dotfile like
.bashrc
) - it would be simple enough to modify the code to allow.abc
format filenames, if desired
Things I've (hopefully) handled:
- Prepend underscore if filename starts with
.
(prevents filenames.
..
and files from disappearing) - Remove directory separators:
/
on Linux, and/
and:
on OS X - Removing invalid Windows filename characters
\/:*?"<>|
(when on Windows, or forced withwindows_safe=True
) - Prepend reserved filenames with underscore (
COM2
becomes_COM2
,NUL
becomes_NUL
etc) - Optional normalisation of Unicode data, so
å
becomesa
and non-convertable characters are removed - Truncation of filenames over 255 characters on Linux/Darwin, and 32 characters on Windows
The code and a bunch of test-cases can be found and fiddled with at http://gist.github.com/256270. The "production" code can be found in tvnamer/utils.py
Is there any errors with this function? Any conditions I've missed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我注意到一点:在 NTFS 下,某些文件无法在特定目录中创建。
EG $在 root 中启动
One point I've noticed: Under NTFS, some files can not be created in specific directories.
E.G. $Boot in root