散列多个文件

发布于 08-13 13:10 字数 652 浏览 10 评论 0原文

问题说明:

给定一个目录,我想迭代该目录及其非隐藏子目录,
并将漩涡哈希添加到非隐藏中 文件名。
如果重新运行脚本,它将用新的哈希值替换旧的哈希值。

<块引用>

<代码><文件名>.<扩展名>  ==><代码><文件名>.。 <扩展名>

<文件名>.<旧哈希>.<扩展名>   ==><文件名>.<新哈希>.<扩展名>


问题:

a) 你会怎么做?

b) 在您可用的所有方法中,什么使您的方法最合适?


结论:

谢谢大家,我选择了 SeigeX 的答案,因为它的速度和便携性。
它明显比其他 bash 变体更快,
 并且它在我的 Mac OS X 机器上无需更改即可运行。

Problem Specification:

Given a directory, I want to iterate through the directory and its non-hidden sub-directories,
 and add a whirlpool hash into the non-hidden
file's names.
If the script is re-run it would would replace an old hash with a new one.

<filename>.<extension>   ==>  <filename>.<a-whirlpool-hash>.<extension>

<filename>.<old-hash>.<extension>   ==>  <filename>.<new-hash>.<extension>

Question:

a) How would you do this?

b) Out of the all methods available to you, what makes your method most suitable?

Verdict:

Thanks all, I have chosen SeigeX's answer for it's speed and portability.
It is emprically quicker than the other bash variants,
 and it worked without alteration on my Mac OS X machine.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

橙幽之幻2024-08-20 13:10:19

更新修复:

1. 文件名中包含“[”或“]”(实际上,现在可以是任何字符。请参阅评论)
2. 对名称中包含反斜杠或换行符的文件进行哈希处理时 md5sum 的处理
3. 模块化的函数化哈希检查算法
4. 重构哈希检查逻辑以删除双重否定

#!/bin/bash
if (($# != 1)) || ! [[ -d "$1" ]]; then
    echo "Usage: $0 /path/to/directory"
    exit 1
fi

is_hash() {
 md5=${1##*.} # strip prefix
 [[ "$md5" == *[^[:xdigit:]]* || ${#md5} -lt 32 ]] && echo "$1" || echo "${1%.*}"
}

while IFS= read -r -d 

与迄今为止的其他条目相比,此代码具有以下优点

  • 它完全符合 Bash 版本 2.0.2 及更高版本
  • 没有对其他版本的多余调用sed 或 grep 等二进制文件;使用内置参数扩展
  • 使用进程替换“find”而不是管道,这样不会生成任何子 shell
  • 将要处理的目录作为参数并对其进行健全性检查
  • 使用 $() 而不是 `` 符号对于命令替换,后者已被弃用
  • 适用于带空格的文件
  • 适用于带换行符的文件
  • 适用于具有多个扩展名的文件
  • 适用于无扩展名的文件
  • 不遍历隐藏目录
  • 跳过预哈希文件,它将根据规范重新计算哈希

测试树

$ tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f
|       |-- g.5236b1ab46088005ed3554940390c8a7.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.5236b1ab46088005ed3554940390c8a7.ext2
|       `-- j.ext1.ext2
|-- c.ext^Mnewline
|   |-- f
|   `-- g.with[or].ext
`-- f^Jnewline.ext

4 directories, 9 files 

结果

$ tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f.d41d8cd98f00b204e9800998ecf8427e
|       |-- g.d41d8cd98f00b204e9800998ecf8427e.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|       `-- j.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|-- c.ext^Mnewline
|   |-- f.d41d8cd98f00b204e9800998ecf8427e
|   `-- g.with[or].d41d8cd98f00b204e9800998ecf8427e.ext
`-- f^Jnewline.d3b07384d113edec49eaa6238ad5ff00.ext

4 directories, 9 files
\0' file; do read hash junk < <(md5sum "$file") basename="${file##*/}" dirname="${file%/*}" pre_ext="${basename%.*}" ext="${basename:${#pre_ext}}" # File already hashed? pre_ext=$(is_hash "$pre_ext") ext=$(is_hash "$ext") mv "$file" "${dirname}/${pre_ext}.${hash}${ext}" 2> /dev/null done < <(find "$1" -path "*/.*" -prune -o \( -type f -print0 \))

与迄今为止的其他条目相比,此代码具有以下优点

  • 它完全符合 Bash 版本 2.0.2 及更高版本
  • 没有对其他版本的多余调用sed 或 grep 等二进制文件;使用内置参数扩展
  • 使用进程替换“find”而不是管道,这样不会生成任何子 shell
  • 将要处理的目录作为参数并对其进行健全性检查
  • 使用 $() 而不是 `` 符号对于命令替换,后者已被弃用
  • 适用于带空格的文件
  • 适用于带换行符的文件
  • 适用于具有多个扩展名的文件
  • 适用于无扩展名的文件
  • 不遍历隐藏目录
  • 跳过预哈希文件,它将根据规范重新计算哈希

测试树

结果

Updated to fix:

1. File names with '[' or ']' in their name (really, any character now. See comment)
2. Handling of md5sum when hashing a file with a backslash or newline in its name
3. Functionized hash-checking algo for modularity
4. Refactored hash-checking logic to remove double-negatives

#!/bin/bash
if (($# != 1)) || ! [[ -d "$1" ]]; then
    echo "Usage: $0 /path/to/directory"
    exit 1
fi

is_hash() {
 md5=${1##*.} # strip prefix
 [[ "$md5" == *[^[:xdigit:]]* || ${#md5} -lt 32 ]] && echo "$1" || echo "${1%.*}"
}

while IFS= read -r -d 

This code has the following benefits over other entries thus far

  • It is fully compliant with Bash versions 2.0.2 and beyond
  • No superfluous calls to other binaries like sed or grep; uses builtin parameter expansion instead
  • Uses process substitution for 'find' instead of a pipe, no sub-shell is made this way
  • Takes the directory to work on as an argument and does a sanity check on it
  • Uses $() rather than `` notation for command substitution, the latter is deprecated
  • Works with files with spaces
  • Works with files with newlines
  • Works with files with multiple extensions
  • Works with files with no extension
  • Does not traverse hidden directories
  • Does NOT skip pre-hashed files, it will recalculate the hash as per the spec

Test Tree

$ tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f
|       |-- g.5236b1ab46088005ed3554940390c8a7.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.5236b1ab46088005ed3554940390c8a7.ext2
|       `-- j.ext1.ext2
|-- c.ext^Mnewline
|   |-- f
|   `-- g.with[or].ext
`-- f^Jnewline.ext

4 directories, 9 files 

Result

$ tree -a a
a
|-- .hidden_dir
|   `-- foo
|-- b
|   `-- c.d
|       |-- f.d41d8cd98f00b204e9800998ecf8427e
|       |-- g.d41d8cd98f00b204e9800998ecf8427e.ext
|       |-- h.d41d8cd98f00b204e9800998ecf8427e
|       |-- i.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|       `-- j.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|-- c.ext^Mnewline
|   |-- f.d41d8cd98f00b204e9800998ecf8427e
|   `-- g.with[or].d41d8cd98f00b204e9800998ecf8427e.ext
`-- f^Jnewline.d3b07384d113edec49eaa6238ad5ff00.ext

4 directories, 9 files
\0' file; do read hash junk < <(md5sum "$file") basename="${file##*/}" dirname="${file%/*}" pre_ext="${basename%.*}" ext="${basename:${#pre_ext}}" # File already hashed? pre_ext=$(is_hash "$pre_ext") ext=$(is_hash "$ext") mv "$file" "${dirname}/${pre_ext}.${hash}${ext}" 2> /dev/null done < <(find "$1" -path "*/.*" -prune -o \( -type f -print0 \))

This code has the following benefits over other entries thus far

  • It is fully compliant with Bash versions 2.0.2 and beyond
  • No superfluous calls to other binaries like sed or grep; uses builtin parameter expansion instead
  • Uses process substitution for 'find' instead of a pipe, no sub-shell is made this way
  • Takes the directory to work on as an argument and does a sanity check on it
  • Uses $() rather than `` notation for command substitution, the latter is deprecated
  • Works with files with spaces
  • Works with files with newlines
  • Works with files with multiple extensions
  • Works with files with no extension
  • Does not traverse hidden directories
  • Does NOT skip pre-hashed files, it will recalculate the hash as per the spec

Test Tree

Result

挽袖吟2024-08-20 13:10:19
#!/bin/bash
find -type f -print0 | while read -d 
  • 在包含“abc”等空格的文件上进行测试
  • 在包含“abc”等多个扩展名的文件上进行
  • 测试 在包含空格和/或点的目录上进行测试。
  • 在包含点的目录中不包含扩展名的文件上进行了测试,例如“ab/c”
  • 更新:现在如果文件发生更改,则会更新哈希值。

要点:

  • 在读取 -d $'\0' 时使用通过管道传输到 print0 来正确处理文件名中的空格。
  • md5sum 可以替换为您最喜欢的哈希函数。 sed 从 md5sum 的输出中删除第一个空格及其后的所有内容。
  • 使用正则表达式提取基本文件名,该正则表达式找到最后一个后面没有另一个斜杠的句点(以便目录名称中的句点不计为扩展名的一部分)。
  • 通过使用以起始索引作为基本文件名长度的子字符串来找到扩展名。
\0' file do md5sum=`md5sum "${file}" | sed -r 's/ .*//'` filename=`echo "${file}" | sed -r 's/\.[^./]*$//'` extension="${file:${#filename}}" filename=`echo "${filename}" | sed -r 's/\.md5sum-[^.]+//'` if [[ "${file}" != "${filename}.md5sum-${md5sum}${extension}" ]]; then echo "Handling file: ${file}" mv "${file}" "${filename}.md5sum-${md5sum}${extension}" fi done
  • 在包含“abc”等空格的文件上进行测试
  • 在包含“abc”等多个扩展名的文件上进行
  • 测试 在包含空格和/或点的目录上进行测试。
  • 在包含点的目录中不包含扩展名的文件上进行了测试,例如“ab/c”
  • 更新:现在如果文件发生更改,则会更新哈希值。

要点:

  • 在读取 -d $'\0' 时使用通过管道传输到 print0 来正确处理文件名中的空格。
  • md5sum 可以替换为您最喜欢的哈希函数。 sed 从 md5sum 的输出中删除第一个空格及其后的所有内容。
  • 使用正则表达式提取基本文件名,该正则表达式找到最后一个后面没有另一个斜杠的句点(以便目录名称中的句点不计为扩展名的一部分)。
  • 通过使用以起始索引作为基本文件名长度的子字符串来找到扩展名。
#!/bin/bash
find -type f -print0 | while read -d 
  • Tested on files containing spaces like 'a b'
  • Tested on files containing multiple extensions like 'a.b.c'
  • Tested with directories containing spaces and/or dots.
  • Tested on files containing no extension inside directories containing dots, such as 'a.b/c'
  • Updated: Now updates hashes if the file changes.

Key points:

  • Use of print0 piped to while read -d $'\0', to correctly handle spaces in file names.
  • md5sum can be replaced with your favourite hash function. The sed removes the first space and everything after it from the output of md5sum.
  • The base filename is extracted using a regular expression that finds the last period that isn't followed by another slash (so that periods in directory names aren't counted as part of the extension).
  • The extension is found by using a substring with starting index as the length of the base filename.
\0' file do md5sum=`md5sum "${file}" | sed -r 's/ .*//'` filename=`echo "${file}" | sed -r 's/\.[^./]*$//'` extension="${file:${#filename}}" filename=`echo "${filename}" | sed -r 's/\.md5sum-[^.]+//'` if [[ "${file}" != "${filename}.md5sum-${md5sum}${extension}" ]]; then echo "Handling file: ${file}" mv "${file}" "${filename}.md5sum-${md5sum}${extension}" fi done
  • Tested on files containing spaces like 'a b'
  • Tested on files containing multiple extensions like 'a.b.c'
  • Tested with directories containing spaces and/or dots.
  • Tested on files containing no extension inside directories containing dots, such as 'a.b/c'
  • Updated: Now updates hashes if the file changes.

Key points:

  • Use of print0 piped to while read -d $'\0', to correctly handle spaces in file names.
  • md5sum can be replaced with your favourite hash function. The sed removes the first space and everything after it from the output of md5sum.
  • The base filename is extracted using a regular expression that finds the last period that isn't followed by another slash (so that periods in directory names aren't counted as part of the extension).
  • The extension is found by using a substring with starting index as the length of the base filename.
魂牵梦绕锁你心扉2024-08-20 13:10:19

需求的逻辑足够复杂,足以证明使用 Python 而不是 bash 是合理的。它应该提供一个更具可读性、可扩展性和可维护性的解决方案。

#!/usr/bin/env python
import hashlib, os

def ishash(h, size):
    """Whether `h` looks like hash's hex digest."""
    if len(h) == size: 
        try:
            int(h, 16) # whether h is a hex number
            return True
        except ValueError:
            return False

for root, dirs, files in os.walk("."):
    dirs[:] = [d for d in dirs if not d.startswith(".")] # skip hidden dirs
    for path in (os.path.join(root, f) for f in files if not f.startswith(".")):
        suffix = hash_ = "." + hashlib.md5(open(path).read()).hexdigest()
        hashsize = len(hash_) - 1
        # extract old hash from the name; add/replace the hash if needed
        barepath, ext = os.path.splitext(path) # ext may be empty
        if not ishash(ext[1:], hashsize):
            suffix += ext # add original extension
            barepath, oldhash = os.path.splitext(barepath) 
            if not ishash(oldhash[1:], hashsize):
               suffix = oldhash + suffix # preserve 2nd (not a hash) extension
        else: # ext looks like a hash
            oldhash = ext
        if hash_ != oldhash: # replace old hash by new one
           os.rename(path, barepath+suffix)

这是一个测试目录树。它包含:

  • 目录内没有扩展名的文件,其名称中带有点
  • 文件名 已包含散列(幂等性测试)
  • 具有两个扩展名的文件名
  • 名称中
$ tree a
a
|-- b
|   `-- c.d
|       |-- f
|       |-- f.ext1.ext2
|       `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
|   `-- f
`-- f^Jnewline.ext1

7 directories, 5 files

换行符结果

$ tree a
a
|-- b
|   `-- c.d
|       |-- f.0bee89b07a248e27c83fc3d5951213c1
|       |-- f.ext1.614dd0e977becb4c6f7fa99e64549b12.ext2
|       `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
|   `-- f.0bee89b07a248e27c83fc3d5951213c1
`-- f^Jnewline.b6fe8bb902ca1b80aaa632b776d77f83.ext1

7 directories, 5 files

该解决方案在所有情况下都能正常工作。


Whirlpool 哈希不在 Python 的 stdlib 中,但有支持它的纯 Python 和 C 扩展,例如 python-mhash

安装它:

$ sudo apt-get install python-mhash

使用它:

import mhash

print mhash.MHASH(mhash.MHASH_WHIRLPOOL, "text to hash here").hexdigest()

输出:
cbdca4520cc5c131fc3a86109dd23fee2d7ff7be56636d398180178378944a4f41480b938608ae98da7eccbf39a4c79b83a8590c4cb1bace5bc638fc92b3e653


调用 w Python 中的 hirlpooldeep

from subprocess import PIPE, STDOUT, Popen

def getoutput(cmd):
    return Popen(cmd, stdout=PIPE, stderr=STDOUT).communicate()[0]

hash_ = getoutput(["whirlpooldeep", "-q", path]).rstrip()

git 可以提供解决需要根据哈希值跟踪文件集的问题。

The logic of the requirements is complex enough to justify the use of Python instead of bash. It should provide a more readable, extensible, and maintainable solution.

#!/usr/bin/env python
import hashlib, os

def ishash(h, size):
    """Whether `h` looks like hash's hex digest."""
    if len(h) == size: 
        try:
            int(h, 16) # whether h is a hex number
            return True
        except ValueError:
            return False

for root, dirs, files in os.walk("."):
    dirs[:] = [d for d in dirs if not d.startswith(".")] # skip hidden dirs
    for path in (os.path.join(root, f) for f in files if not f.startswith(".")):
        suffix = hash_ = "." + hashlib.md5(open(path).read()).hexdigest()
        hashsize = len(hash_) - 1
        # extract old hash from the name; add/replace the hash if needed
        barepath, ext = os.path.splitext(path) # ext may be empty
        if not ishash(ext[1:], hashsize):
            suffix += ext # add original extension
            barepath, oldhash = os.path.splitext(barepath) 
            if not ishash(oldhash[1:], hashsize):
               suffix = oldhash + suffix # preserve 2nd (not a hash) extension
        else: # ext looks like a hash
            oldhash = ext
        if hash_ != oldhash: # replace old hash by new one
           os.rename(path, barepath+suffix)

Here's a test directory tree. It contains:

  • files without extension inside directories with a dot in their name
  • filename which already has a hash in it (test on idempotency)
  • filename with two extensions
  • newlines in names
$ tree a
a
|-- b
|   `-- c.d
|       |-- f
|       |-- f.ext1.ext2
|       `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
|   `-- f
`-- f^Jnewline.ext1

7 directories, 5 files

Result

$ tree a
a
|-- b
|   `-- c.d
|       |-- f.0bee89b07a248e27c83fc3d5951213c1
|       |-- f.ext1.614dd0e977becb4c6f7fa99e64549b12.ext2
|       `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
|   `-- f.0bee89b07a248e27c83fc3d5951213c1
`-- f^Jnewline.b6fe8bb902ca1b80aaa632b776d77f83.ext1

7 directories, 5 files

The solution works correctly for all cases.


Whirlpool hash is not in Python's stdlib, but there are both pure Python and C extensions that support it e.g., python-mhash.

To install it:

$ sudo apt-get install python-mhash

To use it:

import mhash

print mhash.MHASH(mhash.MHASH_WHIRLPOOL, "text to hash here").hexdigest()

Output:
cbdca4520cc5c131fc3a86109dd23fee2d7ff7be56636d398180178378944a4f41480b938608ae98da7eccbf39a4c79b83a8590c4cb1bace5bc638fc92b3e653


Invoking whirlpooldeep in Python

from subprocess import PIPE, STDOUT, Popen

def getoutput(cmd):
    return Popen(cmd, stdout=PIPE, stderr=STDOUT).communicate()[0]

hash_ = getoutput(["whirlpooldeep", "-q", path]).rstrip()

git can provide with leverage for the problems that need to track set of files based on their hashes.

黯然2024-08-20 13:10:19

我对我的第一个答案不太满意,因为正如我所说,这个问题看起来最好用 perl 解决。你已经在你的问题的一次编辑中说过,你想在 OS X 机器上运行 perl,所以我试了一下。

在 bash 中很难做到这一点,即避免奇怪文件名的任何引用问题,并且对特殊文件名表现良好。

这是 Perl 语言,是您问题的完整解决方案。它运行命令行上列出的所有文件/目录。


#!/usr/bin/perl -w
# whirlpool-rename.pl
# 2009 Peter Cordes <[email protected]>.  Share and Enjoy!

use Fcntl;      # for O_BINARY
use File::Find;
use Digest::Whirlpool;

# find callback, called once per directory entry
# $_ is the base name of the file, and we are chdired to that directory.
sub whirlpool_rename {
    print "find: $_\n";
#    my @components = split /\.(?:[[:xdigit:]]{128})?/; # remove .hash while we're at it
    my @components = split /\.(?!\.|$)/, $_, -1; # -1 to not leave out trailing dots

    if (!$components[0] && $_ ne ".") { # hidden file/directory
        $File::Find::prune = 1;
        return;
    }

    # don't follow symlinks or process non-regular-files
    return if (-l $_ || ! -f _);

    my $digest;
    eval {
        sysopen(my $fh, $_, O_RDONLY | O_BINARY) or die "$!";
        $digest = Digest->new( 'Whirlpool' )->addfile($fh);
    };
    if ($@) {  # exception-catching structure from whirlpoolsum, distributed with Digest::Whirlpool.
        warn "whirlpool: couldn't hash $_: $!\n";
        return;
    }

    # strip old hashes from the name.  not done during split only in the interests of readability
    @components = grep { !/^[[:xdigit:]]{128}$/ }  @components;
    if ($#components == 0) {
        push @components, $digest->hexdigest;
    } else {
        my $ext = pop @components;
        push @components, $digest->hexdigest, $ext;
    }

    my $newname = join('.', @components);
    return if $_ eq $newname;
    print "rename  $_ ->  $newname\n";
    if (-e $newname) {
        warn "whirlpool: clobbering $newname\n";
        # maybe unlink $_ and return if $_ is older than $newname?
        # But you'd better check that $newname has the right contents then...
    }
    # This could be link instead of rename, but then you'd have to handle directories, and you can't make hardlinks across filesystems
    rename $_, $newname or warn "whirlpool: couldn't rename $_ -> $newname:  $!\n";
}


#main
$ARGV[0] = "." if !@ARGV;  # default to current directory
find({wanted => \&whirlpool_rename, no_chdir => 0}, @ARGV );

优点:
- 实际上使用漩涡,所以你可以直接使用这个程序。 (安装 libperl-digest-whirlpool 后)。可以轻松更改为您想要的任何摘要功能,因为您拥有 perl Digest 通用接口,而不是具有不同输出格式的不同程序。

  • 实现所有其他要求:忽略隐藏文件(以及隐藏目录下的文件)。

    实现

  • 能够处理任何可能的文件名,而不会出现错误或安全问题。 (有几个人在他们的 shell 脚本中做到了这一点)。

  • 遵循遍历目录树的最佳实践,通过 chdiring 进入每个目录(就像我之前的答案,使用 find -execdir)。这可以避免 PATH_MAX 的问题,以及在运行时重命名目录的问题。

  • 巧妙处理以 . foo..txt...-> foo..hash.txt...

  • 处理已经包含哈希值的旧文件名,无需重命名它们,然后将它们重命名回来。 (它会删除由“.”字符包围的任何 128 个十六进制数字序列。)在一切正确的情况下,不会发生磁盘写入活动,只会读取每个文件。您当前的解决方案在已正确命名的情况下运行 mv 两次,导致目录元数据写入。而且速度较慢,因为必须执行两个进程。

  • 高效。没有程序被 fork/execed,而大多数实际可行的解决方案最终都必须为每个文件执行 sed 操作。
    Digest::Whirlpool 是使用本机编译的共享库实现的,因此纯 Perl 速度并不慢。这应该比在每个文件上运行程序更快,尤其是。对于小文件。

  • Perl 支持 UTF-8 字符串,因此包含非 ascii 字符的文件名应该不成问题。 (不确定 UTF-8 中的多字节序列是否可以单独包含表示 ASCII '.' 的字节。如果可能,那么您需要支持 UTF-8 的字符串处理。sed 不识别 UTF-8 .Bash 的 glob 表达式可能会这样。)

  • 易于扩展。当您将其放入实际程序中并且想要处理更多极端情况时,您可以很容易地做到这一点。例如,当您想要重命名文件但散列命名的文件名已存在时,决定要做什么。

  • 良好的错误报告。不过,大多数 shell 脚本都通过传递它们运行的​​程序中的错误来实现这一点。

I wasn't really happy with my first answer, since as I said there, this problem looks like it's best solved with perl. You already said in one edit of your question that you have perl on the OS X machine you want to run this on, so I gave it a shot.

It's hard to get it all right in bash, i.e. avoiding any quoting problems with odd filenames, and behaving nicely with corner-case filenames.

So here it is in perl, a complete solution to your problem. It runs over all the files/directories listed on its command line.


#!/usr/bin/perl -w
# whirlpool-rename.pl
# 2009 Peter Cordes <[email protected]>.  Share and Enjoy!

use Fcntl;      # for O_BINARY
use File::Find;
use Digest::Whirlpool;

# find callback, called once per directory entry
# $_ is the base name of the file, and we are chdired to that directory.
sub whirlpool_rename {
    print "find: $_\n";
#    my @components = split /\.(?:[[:xdigit:]]{128})?/; # remove .hash while we're at it
    my @components = split /\.(?!\.|$)/, $_, -1; # -1 to not leave out trailing dots

    if (!$components[0] && $_ ne ".") { # hidden file/directory
        $File::Find::prune = 1;
        return;
    }

    # don't follow symlinks or process non-regular-files
    return if (-l $_ || ! -f _);

    my $digest;
    eval {
        sysopen(my $fh, $_, O_RDONLY | O_BINARY) or die "$!";
        $digest = Digest->new( 'Whirlpool' )->addfile($fh);
    };
    if ($@) {  # exception-catching structure from whirlpoolsum, distributed with Digest::Whirlpool.
        warn "whirlpool: couldn't hash $_: $!\n";
        return;
    }

    # strip old hashes from the name.  not done during split only in the interests of readability
    @components = grep { !/^[[:xdigit:]]{128}$/ }  @components;
    if ($#components == 0) {
        push @components, $digest->hexdigest;
    } else {
        my $ext = pop @components;
        push @components, $digest->hexdigest, $ext;
    }

    my $newname = join('.', @components);
    return if $_ eq $newname;
    print "rename  $_ ->  $newname\n";
    if (-e $newname) {
        warn "whirlpool: clobbering $newname\n";
        # maybe unlink $_ and return if $_ is older than $newname?
        # But you'd better check that $newname has the right contents then...
    }
    # This could be link instead of rename, but then you'd have to handle directories, and you can't make hardlinks across filesystems
    rename $_, $newname or warn "whirlpool: couldn't rename $_ -> $newname:  $!\n";
}


#main
$ARGV[0] = "." if !@ARGV;  # default to current directory
find({wanted => \&whirlpool_rename, no_chdir => 0}, @ARGV );

Advantages:
- actually uses whirlpool, so you can use this exact program directly. (after installing libperl-digest-whirlpool). Easy to change to any digest function you want, because instead of different programs with different output formats, you have the perl Digest common interface.

  • implements all other requirements: ignore hidden files (and files under hidden directories).

  • able to handle any possible filename without error or security problem. (Several people got this right in their shell scripts).

  • follows best practices for traversing a directory tree, by chdiring down into each directory (like my previous answer, with find -execdir). This avoids problems with PATH_MAX, and with directories being renamed while you're running.

  • clever handling of filenames that end with . foo..txt... -> foo..hash.txt...

  • Handles old filenames containing hashes already without renaming them and then renaming them back. (It strips any sequence of 128 hex digits that's surrounded by "." characters.) In the everything-correct case, no disk write activity happens, just reads of every file. Your current solution runs mv twice in the already-correctly-named case, causing directory metadata writes. And being slower, because that's two processes that have to be execced.

  • efficient. No programs are fork/execed, while most of the solutions that would actually work ended up having to sed something per-file.
    Digest::Whirlpool is implemented with a natively-compiled shared lib, so it's not slow pure-perl. This should be faster than running a program on every file, esp. for small files.

  • Perl supports UTF-8 strings, so filenames with non-ascii characters shouldn't be a problem. (not sure if any multi-byte sequences in UTF-8 could include the byte that means ASCII '.' on its own. If that is possible, then you need UTF-8 aware string handling. sed doesn't know UTF-8. Bash's glob expressions may.)

  • easily extensible. When you go to put this into a real program, and you want to handle more corner cases, you can do so quite easily. e.g. decide what to do when you want to rename a file but the hash-named filename already exists.

  • good error reporting. Most shell scripts have this, though, by passing along errors from the progs they run.

清晰传感2024-08-20 13:10:19
find . -type f -print | while read file
do
    hash=`$hashcommand "$file"`
    filename=${file%.*}
    extension=${file##*.}
    mv $file "$filename.$hash.$extension"
done
find . -type f -print | while read file
do
    hash=`$hashcommand "$file"`
    filename=${file%.*}
    extension=${file##*.}
    mv $file "$filename.$hash.$extension"
done
非要怀念2024-08-20 13:10:19

您可能希望将结果存储在一个文件中,例如

find . -type f -exec md5sum {} \; > MD5SUMS

如果您确实想要每个散列一个文件:

find . -type f | while read f; do g=`md5sum $f` > $f.md5; done

或者甚至

find . -type f | while read f; do g=`md5sum $f | awk '{print $1}'`; echo "$g $f"> $f-$g.md5; done

You might want to store the results in one file, like in

find . -type f -exec md5sum {} \; > MD5SUMS

If you really want one file per hash:

find . -type f | while read f; do g=`md5sum $f` > $f.md5; done

or even

find . -type f | while read f; do g=`md5sum $f | awk '{print $1}'`; echo "$g $f"> $f-$g.md5; done
全部不再2024-08-20 13:10:19

这是我在 bash 中的看法。特点:跳过非常规文件;正确处理名称中带有奇怪字符(即空格)的文件;处理无扩展名的文件名;跳过已经散列的文件,因此它可以重复运行(尽管如果在运行之间修改文件,它会添加新散列而不是替换旧散列)。我使用 md5 -q 作为哈希函数来编写它;你应该能够用其他任何东西替换它,只要它只输出哈希值,而不是像 filename => 这样的东西。哈希。

find -x . -type f -print0 | while IFS="" read -r -d 
\000' file; do
    hash="$(md5 -q "$file")" # replace with your favorite hash function
    [[ "$file" == *."$hash" ]] && continue # skip files that already end in their hash
    dirname="$(dirname "$file")"
    basename="$(basename "$file")"
    base="${basename%.*}"
    [[ "$base" == *."$hash" ]] && continue # skip files that already end in hash + extension
    if [[ "$basename" == "$base" ]]; then
            extension=""
    else
            extension=".${basename##*.}"
    fi
    mv "$file" "$dirname/$base.$hash$extension"
done

Here's my take on it, in bash. Features: skips non-regular files; correctly deals with files with weird characters (i.e. spaces) in their names; deals with extensionless filenames; skips already-hashed files, so it can be run repeatedly (although if files are modified between runs, it adds the new hash rather than replacing the old one). I wrote it using md5 -q as the hash function; you should be able to replace this with anything else, as long as it only outputs the hash, not something like filename => hash.

find -x . -type f -print0 | while IFS="" read -r -d 
\000' file; do
    hash="$(md5 -q "$file")" # replace with your favorite hash function
    [[ "$file" == *."$hash" ]] && continue # skip files that already end in their hash
    dirname="$(dirname "$file")"
    basename="$(basename "$file")"
    base="${basename%.*}"
    [[ "$base" == *."$hash" ]] && continue # skip files that already end in hash + extension
    if [[ "$basename" == "$base" ]]; then
            extension=""
    else
            extension=".${basename##*.}"
    fi
    mv "$file" "$dirname/$base.$hash$extension"
done
蘑菇王子2024-08-20 13:10:19

在 sh 或 bash 中,有两个版本。一个限制为带有扩展名的文件...

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f -a -name '*.*' | while read f; do
  # remove the echo to run this for real
  echo mv "$f" "${f%.*}.whirlpool-`hash "$f"`.${f##*.}"
done

测试...

...
mv ./bash-4.0/signames.h ./bash-4.0/signames.whirlpool-d71b117a822394a5b273ea6c0e3f4dc045b1098326d39864564f1046ab7bd9296d5533894626288265a1f70638ee3ecce1f6a22739b389ff7cb1fa48c76fa166.h
...

这个更复杂的版本处理所有纯文件,带或不带扩展名,带或不带空格和奇数字符,等等...

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f | while read f; do
  name=${f##*/}
  case "$name" in
    *.*) extension=".${name##*.}" ;;
    *)   extension=   ;;
  esac
  # remove the echo to run this for real
  echo mv "$f" "${f%/*}/${name%.*}.whirlpool-`hash "$f"`$extension"
done

In sh or bash, two versions. One limits itself to files with extensions...

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f -a -name '*.*' | while read f; do
  # remove the echo to run this for real
  echo mv "$f" "${f%.*}.whirlpool-`hash "$f"`.${f##*.}"
done

Testing...

...
mv ./bash-4.0/signames.h ./bash-4.0/signames.whirlpool-d71b117a822394a5b273ea6c0e3f4dc045b1098326d39864564f1046ab7bd9296d5533894626288265a1f70638ee3ecce1f6a22739b389ff7cb1fa48c76fa166.h
...

And this more complex version processes all plain files, with or without extensions, with or without spaces and odd characters, etc, etc...

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f | while read f; do
  name=${f##*/}
  case "$name" in
    *.*) extension=".${name##*.}" ;;
    *)   extension=   ;;
  esac
  # remove the echo to run this for real
  echo mv "$f" "${f%/*}/${name%.*}.whirlpool-`hash "$f"`$extension"
done
淡淡的优雅2024-08-20 13:10:19

漩涡不是一个很常见的哈希。您可能需要安装一个程序来计算它。例如,Debian/Ubuntu 包含一个“whirlpool”软件包。该程序自行打印一个文件的哈希值。 apt-cache search Whirlpool 显示其他一些软件包支持它,包括有趣的 md5deep。

一些早期的答案对于包含空格的文件名会失败。如果是这种情况,但您的文件的文件名中没有任何换行符,那么您可以安全地使用 \n 作为分隔符。


oldifs="$IFS"
IFS="
"
for i in $(find -type f); do echo "$i";done
#output
# ./base
# ./base2
# ./normal.ext
# ./trick.e "xt
# ./foo bar.dir ext/trick' (name "- }$foo.ext{}.ext2
IFS="$oldifs"

尝试不设置 IFS 看看为什么它很重要。

我打算尝试一些 IFS=".";查找-print0 |在读取数组时,以“.”为单位进行分割字符,但我通常从不使用数组变量。我在手册页中看到,没有一种简单的方法可以将散列作为倒数第二个数组索引插入,并下推最后一个元素(文件扩展名,如果有的话)。我知道,任何时候 bash 数组变量看起来都很有趣是时候用 Perl 做我正在做的事情了!请参阅使用 read 的陷阱:
http://tldp.org/LDP/abs/html/gotchas.html# BADREAD0

我决定使用另一种我喜欢的技术:find -exec sh -c。这是最安全的,因为您不解析文件名。

这应该可以解决问题:


find -regextype posix-extended -type f -not -regex '.*\.[a-fA-F0-9]{128}.*'  \
-execdir bash -c 'for i in "${@#./}";do 
 hash=$(whirlpool "$i");
 ext=".${i##*.}"; base="${i%.*}";
 [ "$base" = "$i" ] && ext="";
 newname="$base.$hash$ext";
 echo "ext:$ext  $i -> $newname";
 false mv --no-clobber "$i" "$newname";done' \
dummy {} +
# take out the "false" before the mv, and optionally take out the echo.
# false ignores its arguments, so it's there so you can
# run this to see what will happen without actually renaming your files.

-execdir bash -c 'cmd' dummy {} + 那里有虚拟参数,因为命令后的第一个参数在 shell 的位置参数中变成 $0,而不是 for 循环的“$@”的一部分。我使用 execdir 而不是 exec,因此我不必处理目录名(或者当实际文件名都足够短时,对于具有长名称的嵌套目录,可能会超过 PATH_MAX。)

-not -regex 会阻止应用此功能两次到同一个文件。虽然 Whirlpool 是一个非常长的哈希值,但如果我在没有进行检查的情况下运行它两次,mv 会说文件名太长。 (在 XFS 文件系统上。)

没有扩展名的文件获取 basename.hash。我必须特别检查以避免附加尾随 .,或将基本名称作为扩展名。 ${@#./} 去掉 find 放在每个文件名前面的前导 ./,因此没有“.”。在没有扩展名的文件的整个字符串中。

mv --no-clobber 可能是 GNU 扩展。如果您没有 GNU mv,如果您想避免删除现有文件,请执行其他操作(例如,您运行一次,一些相同的文件将以其旧名称添加到目录中;您再次运行它。)OTOH,如果您想要这种行为,只需将其删除即可。

即使文件名包含换行符(它们可以,你知道!)或任何其他可能的字符,我的解决方案也应该有效。在 Perl 中会更快更容易,但你要求使用 shell。

wallenborn 的解决方案是制作一个包含所有校验和的文件(而不是重命名原始文件),该解决方案非常好,但效率低下。不要每个文件运行一次 md5sum,而是一次在命令行上运行的多个文件上运行它:

find dir -type f -print0 | xargs -0 md5sum >目录.md5
或者使用 GNU find,xargs 是内置的(注意 + 而不是 ';')
查找 dir -type f -exec md5sum {} + > dir.md5

如果你只使用 find -print | xargs -d'\n',你会被带引号的文件名搞砸,所以要小心。如果您不知道有一天可能会在哪些文件上运行此脚本,请始终尝试使用 print0 或 -exec。这是特别是。 true 如果文件名由不受信任的用户提供(即可能是您服务器上的攻击媒介。)

whirlpool isn't a very common hash. You'll probably have to install a program to compute it. e.g. Debian/Ubuntu include a "whirlpool" package. The program prints the hash of one file by itself. apt-cache search whirlpool shows that some other packages support it, including the interesting md5deep.

Some of the earlier anwsers will fail on filenames with spaces in them. If this is the case, but your files don't have any newlines in the filename, then you can safely use \n as a delimiter.


oldifs="$IFS"
IFS="
"
for i in $(find -type f); do echo "$i";done
#output
# ./base
# ./base2
# ./normal.ext
# ./trick.e "xt
# ./foo bar.dir ext/trick' (name "- }$foo.ext{}.ext2
IFS="$oldifs"

try without setting IFS to see why it matters.

I was going to try something with IFS="."; find -print0 | while read -a array, to split on "." characters, but I normally never use array variables. There's no easy way that I see in the man page to insert the hash as the second-last array index, and push down the last element (the file extension, if it had one.) Any time bash array variables look interesting, I know it's time to do what I'm doing in perl instead! See the gotchas for using read:
http://tldp.org/LDP/abs/html/gotchas.html#BADREAD0

I decided to use another technique I like: find -exec sh -c. It's the safest, since you're not parsing filenames.

This should do the trick:


find -regextype posix-extended -type f -not -regex '.*\.[a-fA-F0-9]{128}.*'  \
-execdir bash -c 'for i in "${@#./}";do 
 hash=$(whirlpool "$i");
 ext=".${i##*.}"; base="${i%.*}";
 [ "$base" = "$i" ] && ext="";
 newname="$base.$hash$ext";
 echo "ext:$ext  $i -> $newname";
 false mv --no-clobber "$i" "$newname";done' \
dummy {} +
# take out the "false" before the mv, and optionally take out the echo.
# false ignores its arguments, so it's there so you can
# run this to see what will happen without actually renaming your files.

-execdir bash -c 'cmd' dummy {} + has the dummy arg there because the first arg after the command becomes $0 in the shell's positional parameters, not part of "$@" that for loops over. I use execdir instead of exec so I don't have to deal with directory names (or the possibility of exceeding PATH_MAX for nested dirs with long names, when the actual filenames are all short enough.)

-not -regex prevents this from being applied twice to the same file. Although whirlpool is an extremely long hash, and mv says File name too long if I run it twice without that check. (on an XFS filesystem.)

Files with no extension get basename.hash. I had to check specially to avoid appending a trailing ., or getting the basename as the extension. ${@#./} strips out the leading ./ that find puts in front of every filename, so there is no "." in the whole string for files with no extension.

mv --no-clobber may be a GNU extension. If you don't have GNU mv, do something else if you want to avoid deleting existing files (e.g. you run this once, some of the same file are added to the directory with their old names; you run it again.) OTOH, if you want that behaviour, just take it out.

My solution should work even when filenames contain a newline (they can, you know!), or any other possible character. It would be faster and easier in perl, but you asked for shell.

wallenborn's solution for making one file with all the checksums (instead of renaming the original) is pretty good, but inefficient. Don't run md5sum once per file, run it on as many files at once as will fit on its command line:

find dir -type f -print0 | xargs -0 md5sum > dir.md5
or with GNU find, xargs is built in (note the + instead of ';')
find dir -type f -exec md5sum {} + > dir.md5

if you just use find -print | xargs -d'\n', you will be screwed up by file names with quote marks in them, so be careful. If you don't know what files you might someday run this script on, always try to use print0 or -exec. This is esp. true if filenames are supplied by untrusted users (i.e. could be an attack vector on your server.)

绿萝2024-08-20 13:10:19

回答您更新的问题:

如果有人可以评论我如何避免使用 BASH 脚本查找隐藏目录,我们将不胜感激。

您可以通过使用

find -name '.?*' -prune -o \( -type f -print0 \)

-name '.*' -prune 来避免隐藏目录,它将修剪“.”,然后停止而不执行任何操作。 :/

不过,我仍然推荐我的 Perl 版本。我更新了它...不过,您可能仍然需要从 CPAN 安装 Digest::Whirlpool。

In response to your updated question:

If anyone can comment on how I can avoid looking in hidden directories with my BASH Script, it would be much appreciated.

You can avoid hidden directories with find by using

find -name '.?*' -prune -o \( -type f -print0 \)

-name '.*' -prune will prune ".", and stop without doing anything. :/

I'd still recommend my Perl version, though. I updated it... You may still need to install Digest::Whirlpool from CPAN, though.

美羊羊2024-08-20 13:10:19

嗯,有趣的问题。

尝试以下操作(mktest 函数仅用于测试 - 用于 bash 的 TDD!:)

编辑:

  • 添加了对漩涡哈希的支持。
  • 代码清理
  • 更好地引用文件名
  • 更改了测试部分的数组语法 - 现在应该适用于大多数类似 korn 的 shell。请注意,pdksh 不支持基于 : 的参数扩展(或者更确切地说
    它意味着别的东西)

还要注意,当处于 md5 模式时,对于具有类似漩涡哈希的文件名会失败,并且
可能反之亦然。

#!/usr/bin/env bash

#测试:
# GNU bash,版本 4.0.28(1)-release (x86_64-pc-linux-gnu)
# ksh(AT&T 研究)93s+ 2008-01-31
# mksh @(#)MIRBSD KSH R39 2009/08/01 Debian 39.1-4
# 不适用于 pdksh、dash

DEFAULT_SUM="md5"

#接受一个参数,作为根路径
# 以及一个可选参数,即要使用的哈希函数(md5 或 wp 对于 Whirlpool)。
主要的()
{
  案例 2 美元
    “wp”)
      导出总和=“wp”
      ;;
    “MD5”)
      导出总和=“md5”
      ;;
    *)
      导出 SUM=$DEFAULT_SUM
      ;;
  埃萨克

  # 对于所有可见子文件夹中的所有可见文件,移动该文件
  # 包含正确哈希值的名称:
  find $1 -type f -not -regex '.*/\..*' -exec $0 hashmove '{}' \;
}

# 给定一个名为 $1 的文件及其完整路径,计算它的哈希值。
# 输出文件名,并在扩展名之前插入哈希值
#(如果有)--或者:用新的散列替换现有的散列,
# 如果哈希已经存在。
hashname_md5()
{
  路径名=“$1”
  full_hash=`md5sum "$pathname"`
  哈希=${full_hash:0:32}
  文件名=`基本名“$路径名”`
  前缀=${文件名%%.*}
  后缀=${文件名#$前缀}

  #如果后缀以类似 md5sum 的内容开头,
  #删除它:
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{32}//'`

  echo "$前缀.$哈希$后缀"
}

# 与 hashname_md5 相同——但使用漩涡哈希。
hashname_wp()
{
  路径名=“$1”
  hash=`漩涡“$pathname”`
  文件名=`基本名“$路径名”`
  前缀=${文件名%%.*}
  后缀=${文件名#$前缀}

  #如果后缀以类似 md5sum 的内容开头,
  #删除它:
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{128}//'`

  echo "$前缀.$哈希$后缀"
}


#给定文件路径 $1,将其移动/重命名为包含文件哈希的名称。
# 尝试替换现有的哈希值,如果没有更新则不移动文件
# 需要。
哈希移动()
{
  路径名=“$1”
  文件名=`基本名“$路径名”`
  路径=“${路径名%%/$文件名}”

  案例$SUM
    “wp”)
      hashname=`hashname_wp "$pathname"`
      ;;
    “MD5”)
      hashname=`hashname_md5 "$pathname"`
      ;;
    *)
      echo“请求未知的哈希值”
      1号出口
      ;;
  埃萨克

  if [[ "$filename" != "$hashname" ]]
  然后
      echo“重命名:$pathname => $path/$hashname”
      mv "$pathname" "$path/$hashname"
  别的
    echo "$路径名是最新的"
  菲
}

# 在/tmp下创建一些测试数据
测试()
{
  root_dir=$(临时文件)
  rm“$root_dir”
  mkdir“$root_dir”
  我=0
  test_files[$((i++))]='测试'
  test_files[$((i++))]='测试文件,无扩展名或空格'

  test_files[$((i++))]='.hidden'
  test_files[$((i++))]='隐藏文件'

  test_files[$((i++))]='测试空间'
  test_files[$((i++))]='测试文件,无扩展名,名称中有空格'

  test_files[$((i++))]='test.txt'
  test_files[$((i++))]='测试文件,扩展名,名称中没有空格'

  test_files[$((i++))]='test.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile,使用(错误的)md5sum,名称中没有空格'

  test_files[$((i++))]='测试间隔.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile,带有(错误的)md5sum,名称中有空格'

  test_files[$((i++))]='test.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d 369352a63c3d.txt'
  test_files[$((i++))]='testfile,使用(错误的)whirlpoolhash,名称中没有空格'

  test_files[$((i++))]='测试间隔.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b7 98d369352a63c3d.txt']
  test_files[$((i++))]='testfile,带有(错误的)whirlpoolhash,名称中有空格'

  test_files[$((i++))]='测试空间.txt'
  test_files[$((i++))]='测试文件、扩展名、名称中的空格'

  test_files[$((i++))]='测试多空格.txt'
  test_files[$((i++))]='测试文件,扩展名,名称中多个连续空格'

  test_files[$((i++))]='测试空间.h'
  test_files[$((i++))]='testfile,短扩展名,名称中的空格'

  test_files[$((i++))]='测试空间.reallylong'
  test_files[$((i++))]='testfile,长扩展名,名称中有空格'

  test_files[$((i++))]='测试空间.reallyreallyreallylong.tst'
  test_files[$((i++))]='测试文件,长扩展,双扩展,
                        可能看起来像散列,名称中的空格”

  test_files[$((i++))]='utf8test1 - æeiaæå.txt'
  test_files[$((i++))]='testfile、扩展名、utf8 字符、名称中的空格'

  test_files[$((i++))]='utf8test1 - 汉字.txt'
  test_files[$((i++))]='testfile、扩展名、日语 utf8 字符、名称中的空格'

  对于 s 中的 . sub1 sub2 sub1/sub3 .hidden_​​dir
  做

     #note -p 不需要,因为我们自上而下创建目录
     #因“.”而失败-- 但是 hack 允许我们使用单个循环
     #用于在所有目录中创建测试数据
     mkdir $root_dir/$s
     目录=$root_dir/$s

     我=0
     而[[ $i -lt ${#test_files[*]} ]]
     做
       文件名=${test_files[$((i++))]}
       回声 ${test_files[$((i++))]} > “$目录/$文件名”
     完毕
   完毕

   回显“$root_dir”
}

# 运行测试,给定哈希类型作为第一个参数
运行测试()
{
  总和=1美元

  root_dir=$(mktest)

  echo“创建目录:$root_dir”
  echo“使用哈希类型$sum运行第一个测试:”
  回声
  主要 $root_dir $sum
  回声
  echo“运行第二个测试:”
  回声
  主要 $root_dir $sum
  echo“更新所有文件:”

  查找 $root_dir -type f |当读f时
  做
    echo“更多内容”>> “$f”
  完毕

  回声
  echo“运行最终测试:”
  回声
  主要 $root_dir $sum
  #清理:
  rm -r $根目录
}

# 对生成的数据测试 md5 和 Whirlpool 哈希值。
运行测试()
{
  运行测试 md5
  运行测试wp
}

#为了能够递归调用脚本,而不需要分裂
# 分隔文件的函数:
案例“$1”
  '测试')
    运行测试
  ;;
  '哈希名')
    哈希名称“$2”
  ;;
  '哈希移动')
    哈希移动“$2”
  ;;
  '跑步')
    主“$2”“$3”
  ;;
  *)
    echo“与:$0测试一起使用 - 或者如果您只想在文件夹上尝试它:”
    echo " $0 运行路径(隐含 md5)"
    echo " $0 运行 md5 路径"
    echo "$0 运行 wp 路径"
  ;;
埃萨克

Hm, interesting problem.

Try the following (the mktest function is just for testing -- TDD for bash! :)

Edit:

  • Added support for whirlpool hashes.
  • code cleanup
  • better quoting of filenames
  • changed array-syntax for test part-- should now work with most korn-like shells. Note that pdksh does not support :-based parameter expansion (or rather
    it means something else)

Note also that when in md5-mode it fails for filenames with whirlpool-like hashes, and
possibly vice-versa.

#!/usr/bin/env bash

#Tested with:
# GNU bash, version 4.0.28(1)-release (x86_64-pc-linux-gnu)
# ksh (AT&T Research) 93s+ 2008-01-31
# mksh @(#)MIRBSD KSH R39 2009/08/01 Debian 39.1-4
# Does not work with pdksh, dash

DEFAULT_SUM="md5"

#Takes a parameter, as root path
# as well as an optional parameter, the hash function to use (md5 or wp for whirlpool).
main()
{
  case $2 in
    "wp")
      export SUM="wp"
      ;;
    "md5")
      export SUM="md5"
      ;;
    *)
      export SUM=$DEFAULT_SUM
      ;;
  esac

  # For all visible files in all visible subfolders, move the file
  # to a name including the correct hash:
  find $1 -type f -not -regex '.*/\..*' -exec $0 hashmove '{}' \;
}

# Given a file named in $1 with full path, calculate it's hash.
# Output the filname, with the hash inserted before the extention
# (if any) -- or:  replace an existing hash with the new one,
# if a hash already exist.
hashname_md5()
{
  pathname="$1"
  full_hash=`md5sum "$pathname"`
  hash=${full_hash:0:32}
  filename=`basename "$pathname"`
  prefix=${filename%%.*}
  suffix=${filename#$prefix}

  #If the suffix starts with something that looks like an md5sum,
  #remove it:
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{32}//'`

  echo "$prefix.$hash$suffix"
}

# Same as hashname_md5 -- but uses whirlpool hash.
hashname_wp()
{
  pathname="$1"
  hash=`whirlpool "$pathname"`
  filename=`basename "$pathname"`
  prefix=${filename%%.*}
  suffix=${filename#$prefix}

  #If the suffix starts with something that looks like an md5sum,
  #remove it:
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{128}//'`

  echo "$prefix.$hash$suffix"
}


#Given a filepath $1, move/rename it to a name including the filehash.
# Try to replace an existing hash, an not move a file if no update is
# needed.
hashmove()
{
  pathname="$1"
  filename=`basename "$pathname"`
  path="${pathname%%/$filename}"

  case $SUM in
    "wp")
      hashname=`hashname_wp "$pathname"`
      ;;
    "md5")
      hashname=`hashname_md5 "$pathname"`
      ;;
    *)
      echo "Unknown hash requested"
      exit 1
      ;;
  esac

  if [[ "$filename" != "$hashname" ]]
  then
      echo "renaming: $pathname => $path/$hashname"
      mv "$pathname" "$path/$hashname"
  else
    echo "$pathname up to date"
  fi
}

# Create som testdata under /tmp
mktest()
{
  root_dir=$(tempfile)
  rm "$root_dir"
  mkdir "$root_dir"
  i=0
  test_files[$((i++))]='test'
  test_files[$((i++))]='testfile, no extention or spaces'

  test_files[$((i++))]='.hidden'
  test_files[$((i++))]='a hidden file'

  test_files[$((i++))]='test space'
  test_files[$((i++))]='testfile, no extention, spaces in name'

  test_files[$((i++))]='test.txt'
  test_files[$((i++))]='testfile, extention, no spaces in name'

  test_files[$((i++))]='test.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile, With (wrong) md5sum, no spaces in name'

  test_files[$((i++))]='test spaced.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile, With (wrong) md5sum, spaces in name'

  test_files[$((i++))]='test.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d369352a63c3d.txt'
  test_files[$((i++))]='testfile, With (wrong) whirlpoolhash, no spaces in name'

  test_files[$((i++))]='test spaced.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d369352a63c3d.txt']
  test_files[$((i++))]='testfile, With (wrong) whirlpoolhash, spaces in name'

  test_files[$((i++))]='test space.txt'
  test_files[$((i++))]='testfile, extention, spaces in name'

  test_files[$((i++))]='test   multi-space  .txt'
  test_files[$((i++))]='testfile, extention, multiple consequtive spaces in name'

  test_files[$((i++))]='test space.h'
  test_files[$((i++))]='testfile, short extention, spaces in name'

  test_files[$((i++))]='test space.reallylong'
  test_files[$((i++))]='testfile, long extention, spaces in name'

  test_files[$((i++))]='test space.reallyreallyreallylong.tst'
  test_files[$((i++))]='testfile, long extention, double extention,
                        might look like hash, spaces in name'

  test_files[$((i++))]='utf8test1 - æeiaæå.txt'
  test_files[$((i++))]='testfile, extention, utf8 characters, spaces in name'

  test_files[$((i++))]='utf8test1 - 漢字.txt'
  test_files[$((i++))]='testfile, extention, Japanese utf8 characters, spaces in name'

  for s in . sub1 sub2 sub1/sub3 .hidden_dir
  do

     #note -p not needed as we create dirs top-down
     #fails for "." -- but the hack allows us to use a single loop
     #for creating testdata in all dirs
     mkdir $root_dir/$s
     dir=$root_dir/$s

     i=0
     while [[ $i -lt ${#test_files[*]} ]]
     do
       filename=${test_files[$((i++))]}
       echo ${test_files[$((i++))]} > "$dir/$filename"
     done
   done

   echo "$root_dir"
}

# Run test, given a hash-type as first argument
runtest()
{
  sum=$1

  root_dir=$(mktest)

  echo "created dir: $root_dir"
  echo "Running first test with hashtype $sum:"
  echo
  main $root_dir $sum
  echo
  echo "Running second test:"
  echo
  main $root_dir $sum
  echo "Updating all files:"

  find $root_dir -type f | while read f
  do
    echo "more content" >> "$f"
  done

  echo
  echo "Running final test:"
  echo
  main $root_dir $sum
  #cleanup:
  rm -r $root_dir
}

# Test md5 and whirlpool hashes on generated data.
runtests()
{
  runtest md5
  runtest wp
}

#For in order to be able to call the script recursively, without splitting off
# functions to separate files:
case "$1" in
  'test')
    runtests
  ;;
  'hashname')
    hashname "$2"
  ;;
  'hashmove')
    hashmove "$2"
  ;;
  'run')
    main "$2" "$3"
  ;;
  *)
    echo "Use with: $0 test - or if you just want to try it on a folder:"
    echo "  $0 run path (implies md5)"
    echo "  $0 run md5 path"
    echo "  $0 run wp path"
  ;;
esac
数理化全能战士2024-08-20 13:10:19

使用 zsh:

$ ls
a.txt
b.txt
c.txt

魔法:

$ FILES=**/*(.) 
$ # */ stupid syntax coloring thinks this is a comment
$ for f in $FILES; do hash=`md5sum $f | cut -f1 -d" "`; mv $f "$f:r.$hash.$f:e"; done
$ ls
a.60b725f10c9c85c70d97880dfe8191b3.txt
b.3b5d5c3712955042212316173ccf37be.txt
c.2cd6ee2c70b0bde53fbe6cac3c8b8bb1.txt

解构快乐!

编辑:在子目录中添加文件并在 mv 参数周围添加引号

using zsh:

$ ls
a.txt
b.txt
c.txt

The magic:

$ FILES=**/*(.) 
$ # */ stupid syntax coloring thinks this is a comment
$ for f in $FILES; do hash=`md5sum $f | cut -f1 -d" "`; mv $f "$f:r.$hash.$f:e"; done
$ ls
a.60b725f10c9c85c70d97880dfe8191b3.txt
b.3b5d5c3712955042212316173ccf37be.txt
c.2cd6ee2c70b0bde53fbe6cac3c8b8bb1.txt

Happy deconstruction!

Edit: added files in subdirectories and quotes around mv argument

瀟灑尐姊2024-08-20 13:10:19

Ruby:

#!/usr/bin/env ruby
require 'digest/md5'

Dir.glob('**/*') do |f|
  next unless File.file? f
  next if /\.md5sum-[0-9a-f]{32}/ =~ f
  md5sum = Digest::MD5.file f
  newname = "%s/%s.md5sum-%s%s" %
    [File.dirname(f), File.basename(f,'.*'), md5sum, File.extname(f)]
  File.rename f, newname
end

处理带有空格、无扩展名且已进行哈希处理的文件名。

忽略隐藏文件和目录 - 如果需要,添加 File::FNM_DOTMATCH 作为 glob 的第二个参数。

Ruby:

#!/usr/bin/env ruby
require 'digest/md5'

Dir.glob('**/*') do |f|
  next unless File.file? f
  next if /\.md5sum-[0-9a-f]{32}/ =~ f
  md5sum = Digest::MD5.file f
  newname = "%s/%s.md5sum-%s%s" %
    [File.dirname(f), File.basename(f,'.*'), md5sum, File.extname(f)]
  File.rename f, newname
end

Handles filenames that have spaces, no extension, and that have already been hashed.

Ignores hidden files and directories — add File::FNM_DOTMATCH as the second argument of glob if that's desired.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文