递归地“标准化”文件名
我的意思是摆脱文件名中的特殊字符等。
我制作了一个脚本,可以递归地重命名文件[http://pastebin.com/raw.php?i=kXeHbDQw]:
例如:之前:
THIS i.s my file (1).txt
运行脚本之后:
This-i-s-my-file-1.txt
好的。这是:
但是:当我想“完全”测试它时,文件名如下:
¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?@[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£.txt
它失败了[http://pastebin.com/raw.php?i=iu8Pwrnr]:
$ sh renamer.sh directorythathasthefiles
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†....and so on
$
所以“mv”无法处理特殊的字符..:\
我花了很多时间研究它..
有人有一个可用的吗? [也可以处理那两行中的字符[文件名]?]
i mean getting rid of special chars in filenames, etc.
i have made a script, that can recursively rename files [http://pastebin.com/raw.php?i=kXeHbDQw]:
e.g.: before:
THIS i.s my file (1).txt
after running the script:
This-i-s-my-file-1.txt
Ok. here it is:
But: when i wanted to test it "fully", with filenames like this:
¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?@[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£.txt
it fails [http://pastebin.com/raw.php?i=iu8Pwrnr]:
$ sh renamer.sh directorythathasthefiles
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†....and so on
$
so "mv" can't handle special chars.. :\
i worked on it for many hours..
does anyone has a working one? [that can handle chars [filenames] in that 2 lines too?]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
mv
可以很好地处理特殊字符。你的脚本没有。排名不分先后:
您正在使用
find
查找所有目录,并使用ls
分别查找每个目录。如果使用一个命令可以完全执行相同的操作,为什么要使用
DEPTH in...
?这使得任意深度限制变得不必要
永远不要解析
ls
的输出,尤其如果你可以让find
也处理这个问题确保它在最坏的情况下也能正常工作:
这可以阻止
read
吃掉某些转义符并阻止带有换行符的文件名。您正在重复整个
ls |替换每个字符的
循环。 不要 - 它会降低性能。循环遍历每个目录的所有文件一次,并且仅使用多个sed
,或者在一个sed
中进行多个替换命令。(我本来打算建议
sed 'y/áí/ai/'
,但不幸的是,这似乎不适用于 Unicode。也许perl -CS -Mutf8 -pe ' y/áí/ai/'
会。)您仍在用 ASCII 进行思考:“其他特殊字符 - ASCII 代码 33.. ..255”。不要。
如今,大多数系统都在 UTF-8 编码中使用 Unicode,这种编码具有更多范围的“特殊”字符 - 如此之大以至于将它们一一列出变得毫无意义。 (甚至是多字节 - “e”是一个字节,“ė”是三个字节。)
真正的 ASCII 有 128 个字符。您当前想到的是 ISO 8859 字符集(有时称为“ANSI”) - 特别是 ISO 8859-1。但它们一直到 8859-16,只有“ASCII”部分保持不变。
echo -n $(command)
相当无用。有更简单的方法来查找给定路径的目录和基本名称。例如,您可以这样做
不要使用
egrep
来检查错误。检查程序的返回码。 (就像您已经使用cd
所做的那样。)不要过滤掉其他错误,而是...
大量的
sed 's/------------/-/g'
调用可以更改为单个正则表达式:tr [foo] [bar]
中的[ ]
是不必要的。它们只是导致tr
将[
替换为[
,并将]
替换为]
。认真的吗?
最后,使用
排毒
。mv
handles special characters just fine. Your script doesn't.In no particular order:
You are using
find
to find all directories, andls
each directory separately.Why use
for DEPTH in...
if you can do exactly the same with one command?Which makes the arbitrary depth limit unnecessary
Don't ever parse the output of
ls
, especially if you can letfind
handle that, tooMake sure it works in the worst possible case:
This stops
read
from eating certain escapes and choking on filenames with new-line characters.You are repeating the entire
ls | replace
cycle for every single character. Don't - it kills performance. Loop overeach directoryall files once, and just use multiplesed
's, or multiple replacements in onesed
command.(I was going to suggest
sed 'y/áí/ai/'
, but unfortunately that doesn't seem to work with Unicode. Perhapsperl -CS -Mutf8 -pe 'y/áí/ai/'
would.)You're still thinking in ASCII: "other special chars - ASCII Codes 33.. ..255". Don't.
These days, most systems use Unicode in UTF-8 encoding, which has a much wider range of "special" characters - so big that listing them out one by one becomes pointless. (It is even multibyte - "e" is one byte, "ė" is three bytes.)
True ASCII has 128 characters. What you currently have in mind are the ISO 8859 character sets (sometimes called "ANSI") - in particular, ISO 8859-1. But they go all the way up to 8859-16, and only the "ASCII" part stays the same.
echo -n $(command)
is rather useless.There are much easier ways to find the directory and basename given a path. For example, you can do
Do not use
egrep
to check for errors. Check the program's return code. (Like you already do withcd
.)And instead of filtering out other errors, do...
The ton of
sed 's/------------/-/g'
calls can be changed to a single regexp:The
[ ]
s intr [foo] [bar]
are unnecessary. They just causetr
to replace[
to[
, and]
to]
.Seriously?
How about this instead?
And finally, use
detox
.尝试如下操作:
使用 xargs(1) 将确保每个文件名完全作为一个参数传递。 awk(1) 用于在旧文件名之后添加新文件名。
还有一个技巧: sed -e 's/-+/-/g' 会将多个“-”组替换为恰好一个。
Try something like:
Use of xargs(1) will ensure that each filename passed exactly as one parameter. awk(1) is used to add new filename right after old one.
One more trick: sed -e 's/-+/-/g' will replace groups of more than one "-" with exactly one.
假设您的脚本的其余部分是正确的,您的问题是您正在使用
read
但您应该使用read -r
。注意反斜杠是如何消失的:Assuming the rest of your script is right, your problem is that you are using
read
but you should useread -r
. Notice how the backslash disappeared:呃...
清理脚本的一些技巧:
** 使用 sed 一次对多个字符进行翻译,这将清理内容并使其更易于管理:
** 而不是为每个重命名文件 更改,运行所有过滤器,然后执行一个操作
**,而不是执行
ls | read ...
循环,使用:** 将路径遍历和重命名逻辑分离到两个脚本中。一个脚本查找需要重命名的文件,一个脚本处理单个文件的规范化。一旦你学会了“查找”命令,你就会意识到你可以扔掉第一个脚本:)
Ugh...
Some tips to clean up your script:
** Use sed to do translation on multiple characters at once, that'll clean things up and make it easier to manage:
** rather than renaming the file for each change, run all your filters then do one move
** rather than doing a
ls | read ...
loop, use:** separate out your path traversal and renaming logic into two scripts. One script finds the files which need to be renamed, one script handles the normalization of a single file. Once you learn the 'find' command, you'll realize you can toss the first script :)