如何查找 Linux 中同一目录中存在的同名但大小写不同的重复文件?

发布于 2024-08-18 20:08:38 字数 240 浏览 4 评论 0原文

如何返回名为重复的文件列表,即存在于同一目录中的具有相同名称但大小写不同的文件?

我不关心文件的内容。我只需要知道任何具有同名副本的文件的位置和名称。

重复示例:

/www/images/taxi.jpg
/www/images/Taxi.jpg

理想情况下,我需要从基本目录递归搜索所有文件。在上面的例子中是 /www/

How can I return a list of files that are named duplicates i.e. have same name but in different case that exist in the same directory?

I don't care about the contents of the files. I just need to know the location and name of any files that have a duplicate of the same name.

Example duplicates:

/www/images/taxi.jpg
/www/images/Taxi.jpg

Ideally I need to search all files recursively from a base directory. In above example it was /www/

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

铃予 2024-08-25 20:08:38

另一个答案很好,但我建议不要使用“相当可怕的”perl 脚本,

perl -pe 's!([^/]+)$!lc $1!e'

它只会小写路径的文件名部分。

编辑1:事实上,整个问题可以通过以下方式解决:

find . | perl -ne 's!([^/]+)$!lc $1!e; print if 1 == $seen{$_}++'

编辑3:我找到了一个使用 sed、sort 和 uniq 的解决方案,它也将打印出重复项,但它仅在文件名中没有空格的情况下才有效:

find . |sed 's,\(.*\)/\(.*\)$,\1/\2\t\1/\L\2,'|sort|uniq -D -f 1|cut -f 1

编辑2:并且这是一个较长的脚本,它将打印出名称,它采用 stdin 上的路径列表,如 find 给出的。不是那么优雅,但仍然:

#!/usr/bin/perl -w

use strict;
use warnings;

my %dup_series_per_dir;
while (<>) {
    my ($dir, $file) = m!(.*/)?([^/]+?)$!;
    push @{$dup_series_per_dir{$dir||'./'}{lc $file}}, $file;
}

for my $dir (sort keys %dup_series_per_dir) {
    my @all_dup_series_in_dir = grep { @{$_} > 1 } values %{$dup_series_per_dir{$dir}};
    for my $one_dup_series (@all_dup_series_in_dir) {
        print "$dir\{" . join(',', sort @{$one_dup_series}) . "}\n";
    }
}

The other answer is great, but instead of the "rather monstrous" perl script i suggest

perl -pe 's!([^/]+)$!lc $1!e'

Which will lowercase just the filename part of the path.

Edit 1: In fact the entire problem can be solved with:

find . | perl -ne 's!([^/]+)$!lc $1!e; print if 1 == $seen{$_}++'

Edit 3: I found a solution using sed, sort and uniq that also will print out the duplicates, but it only works if there are no whitespaces in filenames:

find . |sed 's,\(.*\)/\(.*\)$,\1/\2\t\1/\L\2,'|sort|uniq -D -f 1|cut -f 1

Edit 2: And here is a longer script that will print out the names, it takes a list of paths on stdin, as given by find. Not so elegant, but still:

#!/usr/bin/perl -w

use strict;
use warnings;

my %dup_series_per_dir;
while (<>) {
    my ($dir, $file) = m!(.*/)?([^/]+?)$!;
    push @{$dup_series_per_dir{$dir||'./'}{lc $file}}, $file;
}

for my $dir (sort keys %dup_series_per_dir) {
    my @all_dup_series_in_dir = grep { @{$_} > 1 } values %{$dup_series_per_dir{$dir}};
    for my $one_dup_series (@all_dup_series_in_dir) {
        print "$dir\{" . join(',', sort @{$one_dup_series}) . "}\n";
    }
}
依 靠 2024-08-25 20:08:38

尝试:

ls -1 | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "

简单,真的:-) 管道不是很棒的野兽吗?

ls -1 为您提供每行一个文件,tr '[AZ]' '[az]' 将所有大写字母转换为小写字母,sort< /code> 对它们进行排序(令人惊讶的是),uniq -c 删除后续出现的重复行,同时也给您一个计数,最后,grep -v " 1 " > 删除计数为 1 的那些行。

当我在具有一个“重复”的目录中运行此命令时(我将 qq 复制到 qQ),我得到:

2 qq

对于“此目录和每个子目录”版本,只需替换ls -1find .find DIRNAME 如果您想要特定的目录起点(DIRNAME 是目录您要使用的名称)。

这返回(对我来说):

2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3/%gconf.xml
2 ./.gnome2/accels/blackjack
2 ./qq

这是由以下原因引起的:

pax> ls -1d .gnome2/accels/[bB]* .gconf/system/gstreamer/0.10/audio/profiles/[mM]* [qQ]?
.gconf/system/gstreamer/0.10/audio/profiles/mp3
.gconf/system/gstreamer/0.10/audio/profiles/MP3
.gnome2/accels/blackjack
.gnome2/accels/Blackjack
qq
qQ

更新:

实际上,经过进一步思考,tr将小写路径的所有组件这样

/a/b/c
/a/B/c

即使它们位于不同的目录中,也将被视为重复项。

如果您只想将单个目录中的重复项显示为匹配项,则可以使用(相当可怕的):

perl -ne '
    chomp;
    @flds = split (/\//);
    $lstf = $f[-1];
    $lstf =~ tr/A-Z/a-z/;
    for ($i =0; $i ne $#flds; $i++) {
        print "$f[$i]/";
    };
    print "$x\n";'

代替:

tr '[A-Z]' '[a-z]'

它的作用是仅小写路径名的最后部分而不是整个路径名。此外,如果您只需要常规文件(没有目录、FIFO 等),请使用 find -type f 来限制返回的内容。

Try:

ls -1 | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "

Simple, really :-) Aren't pipelines wonderful beasts?

The ls -1 gives you the files one per line, the tr '[A-Z]' '[a-z]' converts all uppercase to lowercase, the sort sorts them (surprisingly enough), uniq -c removes subsequent occurrences of duplicate lines whilst giving you a count as well and, finally, the grep -v " 1 " strips out those lines where the count was one.

When I run this in a directory with one "duplicate" (I copied qq to qQ), I get:

2 qq

For the "this directory and every subdirectory" version, just replace ls -1 with find . or find DIRNAME if you want a specific directory starting point (DIRNAME is the directory name you want to use).

This returns (for me):

2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3
2 ./.gconf/system/gstreamer/0.10/audio/profiles/mp3/%gconf.xml
2 ./.gnome2/accels/blackjack
2 ./qq

which are caused by:

pax> ls -1d .gnome2/accels/[bB]* .gconf/system/gstreamer/0.10/audio/profiles/[mM]* [qQ]?
.gconf/system/gstreamer/0.10/audio/profiles/mp3
.gconf/system/gstreamer/0.10/audio/profiles/MP3
.gnome2/accels/blackjack
.gnome2/accels/Blackjack
qq
qQ

Update:

Actually, on further reflection, the tr will lowercase all components of the path so that both of

/a/b/c
/a/B/c

will be considered duplicates even though they're in different directories.

If you only want duplicates within a single directory to show as a match, you can use the (rather monstrous):

perl -ne '
    chomp;
    @flds = split (/\//);
    $lstf = $f[-1];
    $lstf =~ tr/A-Z/a-z/;
    for ($i =0; $i ne $#flds; $i++) {
        print "$f[$i]/";
    };
    print "$x\n";'

in place of:

tr '[A-Z]' '[a-z]'

What it does is to only lowercase the final portion of the pathname rather than the whole thing. In addition, if you only want regular files (no directories, FIFOs and so forth), use find -type f to restrict what's returned.

音栖息无 2024-08-25 20:08:38

我相信

ls | sort -f | uniq -i -d

更简单、更快,并且会给出相同的结果

I believe

ls | sort -f | uniq -i -d

is simpler, faster, and will give the same result

一笔一画续写前缘 2024-08-25 20:08:38

跟进 mpez0 的响应,要递归检测,只需将“ls”替换为“find .”。
我看到的唯一问题是,如果这是一个重复的目录,那么该目录中的每个文件都有 1 个条目。需要一些人脑来处理这个输出。

但无论如何,您不会自动删除这些文件,是吗?

find . | sort -f | uniq -i -d

Following up on the response of mpez0, to detect recursively just replace "ls" by "find .".
The only problem I see with this is that if this is a directory that is duplicating, then you have 1 entry for each files in this directory. Some human brain is required to treat the output of this.

But anyway, you're not automatically deleting these files, are you?

find . | sort -f | uniq -i -d
像你 2024-08-25 20:08:38

这是一个名为 findsn 的不错的小命令行应用程序,如果您编译 deb 包中不包含的 fslint,您就会得到它。

它会找到任何具有相同名称的文件,并且速度快如闪电,并且可以处理不同的情况。

/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]

如果未提供参数,则在 $PATH 中搜索任何多余的内容
或有冲突的文件。

-A  reports all aliases (soft and hard links) to files.
    If no path(s) specified then the $PATH is searched.

如果仅指定路径,则检查它们是否有重复的命名
文件。您可以使用 -C 对其进行限定,以忽略此搜索中的大小写。
使用 -c 进行限定更具限制性,因为仅限文件(或目录)
在同一目录中,其名称仅在大小写不同的情况下被报告。
IE -c 将标记文件和传输时会发生冲突的目录
到不区分大小写的文件系统。请注意是否指定了 -c 或 -C 以及
假设没有指定当前目录的路径。

This is a nice little command line app called findsn you get if you compile fslint that the deb package does not include.

it will find any files with the same name, and its lightning fast and it can handle different case.

/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]

If no arguments are supplied the $PATH is searched for any redundant
or conflicting files.

-A  reports all aliases (soft and hard links) to files.
    If no path(s) specified then the $PATH is searched.

If only path(s) specified then they are checked for duplicate named
files. You can qualify this with -C to ignore case in this search.
Qualifying with -c is more restrictive as only files (or directories)
in the same directory whose names differ only in case are reported.
I.E. -c will flag files & directories that will conflict if transfered
to a case insensitive file system. Note if -c or -C specified and
no path(s) specified the current directory is assumed.

美人骨 2024-08-25 20:08:38

以下是如何查找所有重复的 jar 文件的示例:

find . -type f -printf "%f\n" -name "*.jar" | sort -f | uniq -i -d

*.jar 替换为您要查找的任何重复文件类型。

Here is an example how to find all duplicate jar files:

find . -type f -printf "%f\n" -name "*.jar" | sort -f | uniq -i -d

Replace *.jar with whatever duplicate file type you are looking for.

总以为 2024-08-25 20:08:38

这是一个对我有用的脚本(我不是作者)。原文和讨论可以在这里找到:
http://www.daemonforums.org/showthread.php?t=4661

#! /bin/sh

# find duplicated files in directory tree
# comparing by file NAME, SIZE or MD5 checksum
# --------------------------------------------
# LICENSE(s): BSD / CDDL
# --------------------------------------------
# vermaden [AT] interia [DOT] pl
# http://strony.toya.net.pl/~vermaden/links.htm

__usage() {
  echo "usage: $( basename ${0} ) OPTION DIRECTORY"
  echo "  OPTIONS: -n   check by name (fast)"
  echo "           -s   check by size (medium)"
  echo "           -m   check by md5  (slow)"
  echo "           -N   same as '-n' but with delete instructions printed"
  echo "           -S   same as '-s' but with delete instructions printed"
  echo "           -M   same as '-m' but with delete instructions printed"
  echo "  EXAMPLE: $( basename ${0} ) -s /mnt"
  exit 1
  }

__prefix() {
  case $( id -u ) in
    (0) PREFIX="rm -rf" ;;
    (*) case $( uname ) in
          (SunOS) PREFIX="pfexec rm -rf" ;;
          (*)     PREFIX="sudo rm -rf"   ;;
        esac
        ;;
  esac
  }

__crossplatform() {
  case $( uname ) in
    (FreeBSD)
      MD5="md5 -r"
      STAT="stat -f %z"
      ;;
    (Linux)
      MD5="md5sum"
      STAT="stat -c %s"
      ;;
    (SunOS)
      echo "INFO: supported systems: FreeBSD Linux"
      echo
      echo "Porting to Solaris/OpenSolaris"
      echo "  -- provide values for MD5/STAT in '$( basename ${0} ):__crossplatform()'"
      echo "  -- use digest(1) instead for md5 sum calculation"
      echo "       $ digest -a md5 file"
      echo "  -- pfexec(1) is already used in '$( basename ${0} ):__prefix()'"
      echo
      exit 1
    (*)
      echo "INFO: supported systems: FreeBSD Linux"
      exit 1
      ;;
  esac
  }

__md5() {
  __crossplatform
  :> ${DUPLICATES_FILE}
  DATA=$( find "${1}" -type f -exec ${MD5} {} ';' | sort -n )
  echo "${DATA}" \
    | awk '{print $1}' \
    | uniq -c \
    | while read LINE
      do
        COUNT=$( echo ${LINE} | awk '{print $1}' )
        [ ${COUNT} -eq 1 ] && continue
        SUM=$( echo ${LINE} | awk '{print $2}' )
        echo "${DATA}" | grep ${SUM} >> ${DUPLICATES_FILE}
      done

  echo "${DATA}" \
    | awk '{print $1}' \
    | sort -n \
    | uniq -c \
    | while read LINE
      do
        COUNT=$( echo ${LINE} | awk '{print $1}' )
        [ ${COUNT} -eq 1 ] && continue
        SUM=$( echo ${LINE} | awk '{print $2}' )
        echo "count: ${COUNT} | md5: ${SUM}"
        grep ${SUM} ${DUPLICATES_FILE} \
          | cut -d ' ' -f 2-10000 2> /dev/null \
          | while read LINE
            do
              if [ -n "${PREFIX}" ]
              then
                echo "  ${PREFIX} \"${LINE}\""
              else
                echo "  ${LINE}"
              fi
            done
        echo
      done
  rm -rf ${DUPLICATES_FILE}
  }

__size() {
  __crossplatform
  find "${1}" -type f -exec ${STAT} {} ';' \
    | sort -n \
    | uniq -c \
    | while read LINE
      do
        COUNT=$( echo ${LINE} | awk '{print $1}' )
        [ ${COUNT} -eq 1 ] && continue
        SIZE=$( echo ${LINE} | awk '{print $2}' )
        SIZE_KB=$( echo ${SIZE} / 1024 | bc )
        echo "count: ${COUNT} | size: ${SIZE_KB}KB (${SIZE} bytes)"
        if [ -n "${PREFIX}" ]
        then
          find ${1} -type f -size ${SIZE}c -exec echo "  ${PREFIX} \"{}\"" ';'
        else
          # find ${1} -type f -size ${SIZE}c -exec echo "  {}  " ';'  -exec du -h "  {}" ';'
          find ${1} -type f -size ${SIZE}c -exec echo "  {}  " ';'
        fi
        echo
      done
  }

__file() {
  __crossplatform
  find "${1}" -type f \
    | xargs -n 1 basename 2> /dev/null \
    | tr '[A-Z]' '[a-z]' \
    | sort -n \
    | uniq -c \
    | sort -n -r \
    | while read LINE
      do
        COUNT=$( echo ${LINE} | awk '{print $1}' )
        [ ${COUNT} -eq 1 ] && break
        FILE=$( echo ${LINE} | cut -d ' ' -f 2-10000 2> /dev/null )
        echo "count: ${COUNT} | file: ${FILE}"
        FILE=$( echo ${FILE} | sed -e s/'\['/'\\\['/g -e s/'\]'/'\\\]'/g )
        if [ -n "${PREFIX}" ]
        then
          find ${1} -iname "${FILE}" -exec echo "  ${PREFIX} \"{}\"" ';'
        else
          find ${1} -iname "${FILE}" -exec echo "  {}" ';'
        fi
        echo
      done 
  }

# main()

[ ${#} -ne 2  ] && __usage
[ ! -d "${2}" ] && __usage

DUPLICATES_FILE="/tmp/$( basename ${0} )_DUPLICATES_FILE.tmp"

case ${1} in
  (-n)           __file "${2}" ;;
  (-m)           __md5  "${2}" ;;
  (-s)           __size "${2}" ;;
  (-N) __prefix; __file "${2}" ;;
  (-M) __prefix; __md5  "${2}" ;;
  (-S) __prefix; __size "${2}" ;;
  (*)  __usage ;;
esac

如果 find 命令不适合您,您可能需要更改它。例如

OLD :   find "${1}" -type f | xargs -n 1 basename 
NEW :   find "${1}" -type f -printf "%f\n"

Here's a script that worked for me ( I am not the author). the original and discussion can be found here:
http://www.daemonforums.org/showthread.php?t=4661

#! /bin/sh

# find duplicated files in directory tree
# comparing by file NAME, SIZE or MD5 checksum
# --------------------------------------------
# LICENSE(s): BSD / CDDL
# --------------------------------------------
# vermaden [AT] interia [DOT] pl
# http://strony.toya.net.pl/~vermaden/links.htm

__usage() {
  echo "usage: $( basename ${0} ) OPTION DIRECTORY"
  echo "  OPTIONS: -n   check by name (fast)"
  echo "           -s   check by size (medium)"
  echo "           -m   check by md5  (slow)"
  echo "           -N   same as '-n' but with delete instructions printed"
  echo "           -S   same as '-s' but with delete instructions printed"
  echo "           -M   same as '-m' but with delete instructions printed"
  echo "  EXAMPLE: $( basename ${0} ) -s /mnt"
  exit 1
  }

__prefix() {
  case $( id -u ) in
    (0) PREFIX="rm -rf" ;;
    (*) case $( uname ) in
          (SunOS) PREFIX="pfexec rm -rf" ;;
          (*)     PREFIX="sudo rm -rf"   ;;
        esac
        ;;
  esac
  }

__crossplatform() {
  case $( uname ) in
    (FreeBSD)
      MD5="md5 -r"
      STAT="stat -f %z"
      ;;
    (Linux)
      MD5="md5sum"
      STAT="stat -c %s"
      ;;
    (SunOS)
      echo "INFO: supported systems: FreeBSD Linux"
      echo
      echo "Porting to Solaris/OpenSolaris"
      echo "  -- provide values for MD5/STAT in '$( basename ${0} ):__crossplatform()'"
      echo "  -- use digest(1) instead for md5 sum calculation"
      echo "       $ digest -a md5 file"
      echo "  -- pfexec(1) is already used in '$( basename ${0} ):__prefix()'"
      echo
      exit 1
    (*)
      echo "INFO: supported systems: FreeBSD Linux"
      exit 1
      ;;
  esac
  }

__md5() {
  __crossplatform
  :> ${DUPLICATES_FILE}
  DATA=$( find "${1}" -type f -exec ${MD5} {} ';' | sort -n )
  echo "${DATA}" \
    | awk '{print $1}' \
    | uniq -c \
    | while read LINE
      do
        COUNT=$( echo ${LINE} | awk '{print $1}' )
        [ ${COUNT} -eq 1 ] && continue
        SUM=$( echo ${LINE} | awk '{print $2}' )
        echo "${DATA}" | grep ${SUM} >> ${DUPLICATES_FILE}
      done

  echo "${DATA}" \
    | awk '{print $1}' \
    | sort -n \
    | uniq -c \
    | while read LINE
      do
        COUNT=$( echo ${LINE} | awk '{print $1}' )
        [ ${COUNT} -eq 1 ] && continue
        SUM=$( echo ${LINE} | awk '{print $2}' )
        echo "count: ${COUNT} | md5: ${SUM}"
        grep ${SUM} ${DUPLICATES_FILE} \
          | cut -d ' ' -f 2-10000 2> /dev/null \
          | while read LINE
            do
              if [ -n "${PREFIX}" ]
              then
                echo "  ${PREFIX} \"${LINE}\""
              else
                echo "  ${LINE}"
              fi
            done
        echo
      done
  rm -rf ${DUPLICATES_FILE}
  }

__size() {
  __crossplatform
  find "${1}" -type f -exec ${STAT} {} ';' \
    | sort -n \
    | uniq -c \
    | while read LINE
      do
        COUNT=$( echo ${LINE} | awk '{print $1}' )
        [ ${COUNT} -eq 1 ] && continue
        SIZE=$( echo ${LINE} | awk '{print $2}' )
        SIZE_KB=$( echo ${SIZE} / 1024 | bc )
        echo "count: ${COUNT} | size: ${SIZE_KB}KB (${SIZE} bytes)"
        if [ -n "${PREFIX}" ]
        then
          find ${1} -type f -size ${SIZE}c -exec echo "  ${PREFIX} \"{}\"" ';'
        else
          # find ${1} -type f -size ${SIZE}c -exec echo "  {}  " ';'  -exec du -h "  {}" ';'
          find ${1} -type f -size ${SIZE}c -exec echo "  {}  " ';'
        fi
        echo
      done
  }

__file() {
  __crossplatform
  find "${1}" -type f \
    | xargs -n 1 basename 2> /dev/null \
    | tr '[A-Z]' '[a-z]' \
    | sort -n \
    | uniq -c \
    | sort -n -r \
    | while read LINE
      do
        COUNT=$( echo ${LINE} | awk '{print $1}' )
        [ ${COUNT} -eq 1 ] && break
        FILE=$( echo ${LINE} | cut -d ' ' -f 2-10000 2> /dev/null )
        echo "count: ${COUNT} | file: ${FILE}"
        FILE=$( echo ${FILE} | sed -e s/'\['/'\\\['/g -e s/'\]'/'\\\]'/g )
        if [ -n "${PREFIX}" ]
        then
          find ${1} -iname "${FILE}" -exec echo "  ${PREFIX} \"{}\"" ';'
        else
          find ${1} -iname "${FILE}" -exec echo "  {}" ';'
        fi
        echo
      done 
  }

# main()

[ ${#} -ne 2  ] && __usage
[ ! -d "${2}" ] && __usage

DUPLICATES_FILE="/tmp/$( basename ${0} )_DUPLICATES_FILE.tmp"

case ${1} in
  (-n)           __file "${2}" ;;
  (-m)           __md5  "${2}" ;;
  (-s)           __size "${2}" ;;
  (-N) __prefix; __file "${2}" ;;
  (-M) __prefix; __md5  "${2}" ;;
  (-S) __prefix; __size "${2}" ;;
  (*)  __usage ;;
esac

If the find command is not working for you, you may have to change it. For example

OLD :   find "${1}" -type f | xargs -n 1 basename 
NEW :   find "${1}" -type f -printf "%f\n"
长不大的小祸害 2024-08-25 20:08:38

您可以使用:

find -type f  -exec readlink -m {} \; | gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}' | uniq -c

其中:

  • find -type f
    递归打印所有文件的完整路径。

  • -exec readlink -m {} \;
    获取文件的绝对路径

  • gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}'
    将所有文件名替换为小写

  • uniq -c
    唯一的路径,-c 输出重复的计数。

You can use:

find -type f  -exec readlink -m {} \; | gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}' | uniq -c

Where:

  • find -type f
    recursion print all file's full path.

  • -exec readlink -m {} \;
    get file's absolute path

  • gawk 'BEGIN{FS="/";OFS="/"}{$NF=tolower($NF);print}'
    replace the all filename's to lower case

  • uniq -c
    unique the path, -c output the count of duplicate.

深巷少女 2024-08-25 20:08:38

有点晚了,但这是我使用的版本:

find . -type f | awk -F/ '{print $NF}' | sort -f | uniq -i -d

这里我们使用:

  1. find - 查找当前目录下的所有文件
  2. awk - 删除文件名的文件路径部分
  3. sort - 不区分大小写排序
  4. uniq > - 从管道中找出受骗者

(受到@mpez0答案的启发,以及@SimonDowdles对@paxdiablo答案的评论。)

Little bit late to this one, but here's the version I went with:

find . -type f | awk -F/ '{print $NF}' | sort -f | uniq -i -d

Here we are using:

  1. find - find all files under the current dir
  2. awk - remove the file path part of the filename
  3. sort - sort case insensitively
  4. uniq - find the dupes from what makes it through the pipe

(Inspired by @mpez0 answer, and @SimonDowdles comment on @paxdiablo answer.)

南街九尾狐 2024-08-25 20:08:38

您可以使用 GNU awk 检查给定目录中的重复项:

gawk 'BEGINFILE {if ((seen[tolower(FILENAME)]++)) print FILENAME; nextfile}' *

这使用 BEGINFILE 在继续读取文件之前执行一些操作。在本例中,它跟踪出现在数组 seen[] 中的名称,该数组的索引是小写文件的名称。

如果名称已经出现,无论大小写,它都会打印出来。否则,它只是跳转到下一个文件。


看一个例子:

$ tree
.
├── bye.txt
├── hello.txt
├── helLo.txt
├── yeah.txt
└── YEAH.txt

0 directories, 5 files
$ gawk 'BEGINFILE {if ((a[tolower(FILENAME)]++)) print FILENAME; nextfile}' *
helLo.txt
YEAH.txt

You can check duplicates in a given directory with GNU awk:

gawk 'BEGINFILE {if ((seen[tolower(FILENAME)]++)) print FILENAME; nextfile}' *

This uses BEGINFILE to perform some action before going on and reading a file. In this case, it keeps track of the names that have appeared in an array seen[] whose indexes are the names of the files in lowercase.

If a name has already appeared, no matter its case, it prints it. Otherwise, it just jumps to the next file.


See an example:

$ tree
.
├── bye.txt
├── hello.txt
├── helLo.txt
├── yeah.txt
└── YEAH.txt

0 directories, 5 files
$ gawk 'BEGINFILE {if ((a[tolower(FILENAME)]++)) print FILENAME; nextfile}' *
helLo.txt
YEAH.txt
陌上青苔 2024-08-25 20:08:38

我刚刚在 CentOS 上使用 fdupes 来清理整个一堆重复文件......

yum install fdupes

I just used fdupes on CentOS to clean up a whole buncha duplicate files...

yum install fdupes
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文