如何在文件夹层次结构中找到所有不同的文件扩展名?

发布于 2024-08-13 00:01:40 字数 79 浏览 7 评论 0原文

在 Linux 机器上,我想遍历文件夹层次结构并获取其中所有不同文件扩展名的列表。

从 shell 实现此目的的最佳方法是什么?

On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.

What would be the best way to achieve this from a shell?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

心房敞 2024-08-20 00:01:40

试试这个(不确定这是否是最好的方法,但它有效):

find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

它的工作原理如下:

  • 从当前文件夹中查找所有文件
  • 打印文件的扩展名(如果有)
  • 制作一个唯一的排序列表

Try this (not sure if it's the best way, but it works):

find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

It work as following:

  • Find all files from current folder
  • Prints extension of files if any
  • Make a unique sorted list
黑寡妇 2024-08-20 00:01:40

不需要管道来排序,awk 可以做到这一切:

find . -type f | awk -F. '!a[$NF]++{print $NF}'

No need for the pipe to sort, awk can do it all:

find . -type f | awk -F. '!a[$NF]++{print $NF}'
做个少女永远怀春 2024-08-20 00:01:40

我的无 awk、无 sed、无 Perl、无 Python 的 POSIX 兼容替代方案:

find . -name '*.?*' -type f | rev | cut -d. -f1 | rev  | tr '[:upper:]' '[:lower:]' | sort | uniq --count | sort -rn

技巧是它反转行并在开头剪切扩展名。
它还将扩展名转换为小写。

输出示例:

   3689 jpg
   1036 png
    610 mp4
     90 webm
     90 mkv
     57 mov
     12 avi
     10 txt
      3 zip
      2 ogv
      1 xcf
      1 trashinfo
      1 sh
      1 m4v
      1 jpeg
      1 ini
      1 gqv
      1 gcs
      1 dv

My awk-less, sed-less, Perl-less, Python-less POSIX-compliant alternative:

find . -name '*.?*' -type f | rev | cut -d. -f1 | rev  | tr '[:upper:]' '[:lower:]' | sort | uniq --count | sort -rn

The trick is that it reverses the line and cuts the extension at the beginning.
It also converts the extensions to lower case.

Example output:

   3689 jpg
   1036 png
    610 mp4
     90 webm
     90 mkv
     57 mov
     12 avi
     10 txt
      3 zip
      2 ogv
      1 xcf
      1 trashinfo
      1 sh
      1 m4v
      1 jpeg
      1 ini
      1 gqv
      1 gcs
      1 dv
虚拟世界 2024-08-20 00:01:40

递归版本:

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

如果您想要总数(看到扩展名的次数):

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn

非递归(单个文件夹):

for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u

我基于 此论坛帖子,信用应该去那里。

Recursive version:

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u

If you want totals (how may times the extension was seen):

find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn

Non-recursive (single folder):

for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u

I've based this upon this forum post, credit should go there.

糖粟与秋泊 2024-08-20 00:01:40

Powershell:

dir -recurse | select-object extension -unique

感谢 http://kevin-berridge.blogspot.com/2007 /11/windows-powershell.html

Powershell:

dir -recurse | select-object extension -unique

Thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html

甜`诱少女 2024-08-20 00:01:40

添加我自己的变体。我认为这是最简单的,当效率不是一个大问题时,它会很有用。

find . -type f | grep -oE '\.(\w+)
 | sort -u

Adding my own variation to the mix. I think it's the simplest of the lot and can be useful when efficiency is not a big concern.

find . -type f | grep -oE '\.(\w+)
 | sort -u
禾厶谷欠 2024-08-20 00:01:40

查找所有带点的内容并仅显示后缀。

find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u

如果您知道所有后缀都有 3 个字符,那么

find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u

or 与 sed 显示所有后缀有 1 到 4 个字符。将 {1,4} 更改为您期望后缀中的字符范围。

find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -u

Find everythin with a dot and show only the suffix.

find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u

if you know all suffix have 3 characters then

find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u

or with sed shows all suffixes with one to four characters. Change {1,4} to the range of characters you are expecting in the suffix.

find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -u
月光色 2024-08-20 00:01:40

我在这里尝试了很多答案,甚至是“最佳”答案。他们都没有达到我具体追求的目标。因此,除了过去 12 小时坐在多个程序的正则表达式代码以及阅读和测试这些答案之外,这就是我想出的,它完全按照我想要的方式工作。

 find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort -u
  • 查找所有可能具有扩展名的文件。
  • 仅 Greps 扩展名
  • Greps 用于 2 到 16 个字符之间的文件扩展名(如果不符合您的需要,只需调整数字)。这有助于避免缓存文件和系统文件(系统文件位是搜索监狱)。
  • awk 以小写形式打印扩展名。
  • 仅排序并引入唯一值。最初我曾尝试尝试 awk 答案,但它会加倍打印区分大小写的项目。

如果您需要计算文件扩展名,请使用下面的代码。

find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn

虽然这些方法需要一些时间才能完成,并且可能不是解决问题的最佳方法,但它们是有效的。

更新:
根据 @alpha_989 长文件扩展名会导致问题。这是由于原始正则表达式“[[:alpha:]]{3,6}”造成的。我已更新答案以包含正则表达式“[[:alpha:]]{2,16}”。但是,任何使用此代码的人都应该知道,这些数字是最终输出允许的扩展时间的最小值和最大值。该范围之外的任何内容都将在输出中分成多行。

注意:原帖确实读过“- Greps for file extensions between 3 and 6 个字符(如果不符合您的需要,只需调整数字)。这有助于避免缓存文件和系统文件(系统文件位是搜索监狱)。 ”

想法:可用于通过以下方式查找特定长度的文件扩展名:

 find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{4,}" | awk '{print tolower($0)}' | sort -u

其中 4 是要包含的文件扩展名长度,然后还查找超出该长度的任何扩展名。

I tried a bunch of the answers here, even the "best" answer. They all came up short of what I specifically was after. So besides the past 12 hours of sitting in regex code for multiple programs and reading and testing these answers this is what I came up with which works EXACTLY like I want.

 find . -type f -name "*.*" | grep -o -E "\.[^\.]+
quot; | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort -u
  • Finds all files which may have an extension.
  • Greps only the extension
  • Greps for file extensions between 2 and 16 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail).
  • Awk to print the extensions in lower case.
  • Sort and bring in only unique values. Originally I had attempted to try the awk answer but it would double print items that varied in case sensitivity.

If you need a count of the file extensions then use the below code

find . -type f -name "*.*" | grep -o -E "\.[^\.]+
quot; | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn

While these methods will take some time to complete and probably aren't the best ways to go about the problem, they work.

Update:
Per @alpha_989 long file extensions will cause an issue. That's due to the original regex "[[:alpha:]]{3,6}". I have updated the answer to include the regex "[[:alpha:]]{2,16}". However anyone using this code should be aware that those numbers are the min and max of how long the extension is allowed for the final output. Anything outside that range will be split into multiple lines in the output.

Note: Original post did read "- Greps for file extensions between 3 and 6 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail)."

Idea: Could be used to find file extensions over a specific length via:

 find . -type f -name "*.*" | grep -o -E "\.[^\.]+
quot; | grep -o -E "[[:alpha:]]{4,}" | awk '{print tolower($0)}' | sort -u

Where 4 is the file extensions length to include and then find also any extensions beyond that length.

鼻尖触碰 2024-08-20 00:01:40

在 Python 中,使用生成器生成非常大的目录,包括空白扩展名,并获取每个扩展名出现的次数:

import json
import collections
import itertools
import os

root = '/home/andres'
files = itertools.chain.from_iterable((
    files for _,_,files in os.walk(root)
    ))
counter = collections.Counter(
    (os.path.splitext(file_)[1] for file_ in files)
)
print json.dumps(counter, indent=2)

In Python using generators for very large directories, including blank extensions, and getting the number of times each extension shows up:

import json
import collections
import itertools
import os

root = '/home/andres'
files = itertools.chain.from_iterable((
    files for _,_,files in os.walk(root)
    ))
counter = collections.Counter(
    (os.path.splitext(file_)[1] for file_ in files)
)
print json.dumps(counter, indent=2)
郁金香雨 2024-08-20 00:01:40

由于已经有另一个使用 Perl 的解决方案:

如果您安装了 Python,您也可以执行以下操作(从 shell):

python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"

Since there's already another solution which uses Perl:

If you have Python installed you could also do (from the shell):

python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"
只有影子陪我不离不弃 2024-08-20 00:01:40

另一种方法:

find . -type f -name "*.*" -printf "%f\n" | -type f -name "*.*" -printf "%f\n" |而 IFS= 读取 -r;执行 echo "${REPLY##*.}";完成 | sort -u

您可以删除-name "*.*",但这可以确保我们只处理确实具有某种扩展名的文件。

-printffind 的打印内容,而不是 bash。 -printf "%f\n" 仅打印文件名,删除路径(并添加换行符)。

然后,我们使用字符串替换来使用 ${REPLY##*.} 删除最后一个点。

请注意,$REPLY 只是 read 的内置变量。我们可以使用我们自己的形式:while IFS= read -r file,这里 $file 将是变量。

Another way:

find . -type f -name "*.*" -printf "%f\n" | while IFS= read -r; do echo "${REPLY##*.}"; done | sort -u

You can drop the -name "*.*" but this ensures we are dealing only with files that do have an extension of some sort.

The -printf is find's print, not bash. -printf "%f\n" prints only the filename, stripping the path (and adds a newline).

Then we use string substitution to remove up to the last dot using ${REPLY##*.}.

Note that $REPLY is simply read's inbuilt variable. We could just as use our own in the form: while IFS= read -r file, and here $file would be the variable.

伴随着你 2024-08-20 00:01:40

到目前为止,没有一个回复能够正确处理带换行符的文件名(ChristopheD 的除外,它是在我输入此内容时才出现的)。下面的代码不是 shell 的单行代码,但是可以工作,而且速度相当快。

import os, sys

def names(roots):
    for root in roots:
        for a, b, basenames in os.walk(root):
            for basename in basenames:
                yield basename

sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
    if suf:
        print suf

None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this). The following is not a shell one-liner, but works, and is reasonably fast.

import os, sys

def names(roots):
    for root in roots:
        for a, b, basenames in os.walk(root):
            for basename in basenames:
                yield basename

sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
    if suf:
        print suf
灰色世界里的红玫瑰 2024-08-20 00:01:40

我认为最简单的&直接的方法是

for f in *.*; do echo "${f##*.}"; done | sort -u

在ChristopheD的第三种方法的基础上进行修改。

I think the most simple & straightforward way is

for f in *.*; do echo "${f##*.}"; done | sort -u

It's modified on ChristopheD's 3rd way.

离线来电— 2024-08-20 00:01:40

我认为还没有提到这一点:

find . -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c

I don't think this one was mentioned yet:

find . -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c
锦爱 2024-08-20 00:01:40

接受的答案使用 REGEX,您无法使用 REGEX 创建别名命令,您必须将其放入 shell 脚本中,我使用 Amazon Linux 2 并执行以下操作:

  1. 我使用以下命令将接受的答案代码放入文件中:

    sudo vim find.sh

添加此代码:

find ./ -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

通过键入以下内容保存文件: :wq!

  1. sudo vim ~/.bash_profile

  2. alias getext="./path/to/your/find .sh"

  3. :wq!

  4. <代码>。 ~/.bash_profile

The accepted answer uses REGEX and you cannot create an alias command with REGEX, you have to put it into a shell script, I'm using Amazon Linux 2 and did the following:

  1. I put the accepted answer code into a file using :

    sudo vim find.sh

add this code:

find ./ -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

save the file by typing: :wq!

  1. sudo vim ~/.bash_profile

  2. alias getext=". /path/to/your/find.sh"

  3. :wq!

  4. . ~/.bash_profile

孤单情人 2024-08-20 00:01:40

你也可以这样做

find . -type f -name "*.php" -exec PATHTOAPP {} +

you could also do this

find . -type f -name "*.php" -exec PATHTOAPP {} +
遇到 2024-08-20 00:01:40

我发现它既简单又快速...

   # find . -type f -exec basename {} \; | awk -F"." '{print $NF}' > /tmp/outfile.txt
   # cat /tmp/outfile.txt | sort | uniq -c| sort -n > tmp/outfile_sorted.txt

I've found it simple and fast...

   # find . -type f -exec basename {} \; | awk -F"." '{print $NF}' > /tmp/outfile.txt
   # cat /tmp/outfile.txt | sort | uniq -c| sort -n > tmp/outfile_sorted.txt
私藏温柔 2024-08-20 00:01:40

如果您正在寻找尊重 .gitignore 的答案,请检查下面的答案。

git ls-tree -r HEAD --name-only | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u 

If you are looking for answer that respect .gitignore then check below answer.

git ls-tree -r HEAD --name-only | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u 
末骤雨初歇 2024-08-20 00:01:40

Ondra Žižka 版本的另一个版本:

find . -name '*.?*' -type f | rev | cut -d. -f1 | rev | sort | uniq

在区分大小写的文件系统上,恕我直言,不同的大小写不应被视为相同的扩展名。另外,我认为没有必要对文件进行计数来回答 OP 问题。

Another version of Ondra Žižka's one:

find . -name '*.?*' -type f | rev | cut -d. -f1 | rev | sort | uniq

On case sensitive file systems different cases should imho not be treated as the same extension. Also I don't think counting files is necessary as an answer to OPs question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文