当前位置：文江博客话题详情

检查HDFS目录大小的方法？

发布于 2024-11-17 12:09:23 字数 65 浏览 3 评论 0原文

我知道常见 Linux 文件系统中的 du -sh。但是如何使用 HDFS 做到这一点呢？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

百思不得你姐 2024-11-24 12:09:23

0.20.203 之前，并在 2.6.0 中正式弃用：

hadoop fs -dus [directory]

自 ~~0.20.203~~（死链接）1.0.4 仍然通过 2.6.0：

hdfs dfs -du [-s] [-h] URI [URI …]

您还可以运行hadoop fs -help 了解更多信息和细节。

Prior to 0.20.203, and officially deprecated in 2.6.0:

hadoop fs -dus [directory]

Since ~~0.20.203~~ (dead link) 1.0.4 and still compatible through 2.6.0:

hdfs dfs -du [-s] [-h] URI [URI …]

You can also run hadoop fs -help for more info and specifics.

回复收藏 0 原文

深居我梦 2024-11-24 12:09:23

hadoop fs -du -s -h /path/to/dir 以可读形式显示目录的大小。

回复收藏 0 原文

一身软味 2024-11-24 12:09:23

扩展到Matt D和其他答案，该命令可以直到Apache Hadoop 3.0.0

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

它显示给定目录中包含的文件和目录的大小或文件的长度（如果它只是一个文件）。

选项：

-s 选项将导致显示文件长度的汇总摘要，而不是单个文件。如果没有 -s 选项，则通过从给定路径深入 1 层来完成计算。
-h选项将以人类可读的方式格式化文件大小（例如 64.0m 而不是 67108864）
-v 选项会将列名显示为标题行。
-x 选项将从结果计算中排除快照。如果不使用 -x 选项（默认），则始终根据所有 INode 计算结果，包括给定路径下的所有快照。

`du` 返回具有以下格式的三列：

 +-------------------------------------------------------------------+ 
 | size  |  disk_space_consumed_with_all_replicas  |  full_path_name | 
 +-------------------------------------------------------------------+

示例命令：

hadoop fs -du /user/hadoop/dir1 \
    /user/hadoop/file1 \
    hdfs://nn.example.com/user/hadoop/dir1

退出代码：成功时返回 0，错误时返回 -1。

来源：Apache 文档

Extending to Matt D and others answers, the command can be till Apache Hadoop 3.0.0

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

It displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.

Options:

The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, the calculation is done by going 1-level deep from the given path.
The -h option will format file sizes in a human-readable fashion (e.g 64.0m instead of 67108864)
The -v option will display the names of columns as a header line.
The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.

`du` returns three columns with the following format:

 +-------------------------------------------------------------------+ 
 | size  |  disk_space_consumed_with_all_replicas  |  full_path_name | 
 +-------------------------------------------------------------------+

Example command:

hadoop fs -du /user/hadoop/dir1 \
    /user/hadoop/file1 \
    hdfs://nn.example.com/user/hadoop/dir1

Exit Code: Returns 0 on success and -1 on error.

source: Apache doc

回复收藏 0 原文

还如梦归 2024-11-24 12:09:23

这样你就可以得到以 GB 为单位的大小

hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

With this you will get size in GB

hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

回复收藏 0 原文

信仰 2024-11-24 12:09:23

当尝试计算目录中特定文件组的总数时，-s 选项不起作用（在 Hadoop 2.7.1 中）。例如：

目录结构：

some_dir
├abc.txt    
├count1.txt 
├count2.txt 
└def.txt

假设每个文件大小为 1 KB。您可以使用以下命令总结整个目录：

hdfs dfs -du -s some_dir
4096 some_dir

但是，如果我想要包含“count”的所有文件的总和，则该命令无法满足要求。

hdfs dfs -du -s some_dir/count*
1024 some_dir/count1.txt
1024 some_dir/count2.txt

为了解决这个问题，我通常通过 awk 传递输出。

hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
2048

When trying to calculate the total of a particular group of files within a directory the -s option does not work (in Hadoop 2.7.1). For example:

Directory structure:

some_dir
├abc.txt    
├count1.txt 
├count2.txt 
└def.txt

Assume each file is 1 KB in size. You can summarize the entire directory with:

hdfs dfs -du -s some_dir
4096 some_dir

However, if I want the sum of all files containing "count" the command falls short.

hdfs dfs -du -s some_dir/count*
1024 some_dir/count1.txt
1024 some_dir/count2.txt

To get around this I usually pass the output through awk.

hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
2048

回复收藏 0 原文

心头的小情儿 2024-11-24 12:09:23

以人类可读格式获取文件夹大小的最简单方法是

hdfs dfs -du -h /folderpath

添加 -s 来获取总和

The easiest way to get the folder size in a human readable format is

hdfs dfs -du -h /folderpath

where -s can be added to get the total sum

回复收藏 0 原文

今天小雨转甜 2024-11-24 12:09:23

要获取目录的大小，可以使用hdfs dfs -du -s -h /$yourDirectoryName。
hdfs dfsadmin -report 可用于查看快速集群级别存储报告。

回复收藏 0 原文

狠疯拽 2024-11-24 12:09:23

hadoop 版本 2.3.33：

hadoop fs -dus  /path/to/dir  |   awk '{print $2/1024**3 " G"}'

hadoop version 2.3.33:

hadoop fs -dus  /path/to/dir  |   awk '{print $2/1024**3 " G"}'

回复收藏 0 原文

萌︼了一个春 2024-11-24 12:09:23

Hadoop 集群上已用空间的百分比
sudo -u hdfs hadoop fs –df

特定文件夹下的容量：
sudo -u hdfs hadoop fs -du -h /user

回复收藏 0 原文

最偏执的依靠 2024-11-24 12:09:23

hdfs dfs -count

信息：

手册页中的

-count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ... :
  Count the number of directories, files and bytes under the paths
  that match the specified file pattern.  The output columns are:
  DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
  or, with the -q option:
  QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA
        DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME

hdfs dfs -count <dir>

info from man page:

-count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ... :
  Count the number of directories, files and bytes under the paths
  that match the specified file pattern.  The output columns are:
  DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
  or, with the -q option:
  QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA
        DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME

回复收藏 0 原文

╭ゆ眷念 2024-11-24 12:09:23

如果有人需要通过Python方式:)

安装hdfs python包
pip 安装 hdfs

code

from hdfs import InsecureClient

客户端 = InsecureClient('http://hdfs_ip_or_nameservice:50070',user='hdfs')
folder_info = client.content("/tmp/my/hdfs/path")

#打印文件夹/目录大小（以字节为单位）
打印（文件夹信息['长度']）

Incase if someone is need through pythonic way :)

Install hdfs python package
pip install hdfs

code

from hdfs import InsecureClient

client = InsecureClient('http://hdfs_ip_or_nameservice:50070',user='hdfs')
folder_info = client.content("/tmp/my/hdfs/path")

#prints folder/directory size in bytes
print(folder_info['length'])

回复收藏 0 原文

一萌ing 2024-11-24 12:09:23

命令应为 hadoop fs -du -s -h \dirPath

-du [-s] [-h] ... ：显示空间量，以字节为单位，由与指定文件模式匹配的文件使用。
-s ：而不是显示与
匹配的每个单独文件的大小
模式，显示总（摘要）大小。
-h ：以人类可读的方式而不是字节数格式化文件的大小。（例如 MB/GB/TB 等）
请注意，即使没有 -s 选项，这也仅显示一级大小摘要
深入目录。
输出的形式为
大小名称（完整路径）