当前位置：文江博客话题详情

使用 Hadoop 最快的文件访问速度

发布于 2024-12-07 18:29:11 字数 114 浏览 0 评论 0原文

我需要以最快的速度访问单个文件，该文件的多个副本存储在使用 Hadoop 的许多系统中。我还需要以排序的方式查找每个文件的 ping 时间。我应该如何学习hadoop来完成这个任务？请快点帮忙，我的时间很少了。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

指尖上的星空 2024-12-14 18:29:11

如果您需要更快地访问文件，只需使用 setrep 命令。由于当前的硬件限制，这可能不会按比例增加文件吞吐量。

ls 命令没有给出访问时间目录和文件，仅显示修改时间。使用离线图像查看器转储 hdfs fsimage 的内容文件转换为人类可读的格式。以下是使用缩进选项的命令。

bin/hdfs oiv -i fsimagedemo -p 缩进 -o fsimage.txt

中的示例 o/p，查找 ACCESS_TIME 列。

INODE
  INODE_PATH = /user/praveensripati/input/sample.txt
  REPLICATION = 1
  MODIFICATION_TIME = 2011-10-03 12:53
  ACCESS_TIME = 2011-10-03 16:26
  BLOCK_SIZE = 67108864
  BLOCKS [NUM_BLOCKS = 1]
    BLOCK
      BLOCK_ID = -5226219854944388285
      NUM_BYTES = 529
      GENERATION_STAMP = 1005
  NS_QUOTA = -1
  DS_QUOTA = -1
  PERMISSIONS
    USER_NAME = praveensripati
    GROUP_NAME = supergroup
    PERMISSION_STRING = rw-r--r--

要以排序的方式获取 ping 时间，您需要编写 shell 脚本或其他程序来提取每个 INODE 部分的 INODE_PATH 和 ACCESS_TIME，然后根据 ACCESS_TIME 对它们进行排序。您还可以使用 Pig，如下所示这里。

我应该如何学习hadoop来完成这个任务？请快点帮忙。我的时间很少了。

如果你想在一两天内学会 Hadoop 是不可能的。这里是一些入门视频和文章。

If you need faster access to a file just increase the replication factor to that file using setrep command. This might not increase the file throughput proportionally, because of your current hardware limitations.

The ls command is not giving the access time for the directories and the files, it's showing the modification time only. Use the Offline Image Viewer to dump the contents of hdfs fsimage files to human-readable formats. Below is the command using the Indented option.

bin/hdfs oiv -i fsimagedemo -p Indented -o fsimage.txt

A sample o/p from the fsimage.txt, look for the ACCESS_TIME column.

INODE
  INODE_PATH = /user/praveensripati/input/sample.txt
  REPLICATION = 1
  MODIFICATION_TIME = 2011-10-03 12:53
  ACCESS_TIME = 2011-10-03 16:26
  BLOCK_SIZE = 67108864
  BLOCKS [NUM_BLOCKS = 1]
    BLOCK
      BLOCK_ID = -5226219854944388285
      NUM_BYTES = 529
      GENERATION_STAMP = 1005
  NS_QUOTA = -1
  DS_QUOTA = -1
  PERMISSIONS
    USER_NAME = praveensripati
    GROUP_NAME = supergroup
    PERMISSION_STRING = rw-r--r--

To get the ping time in a sorted manner, you need to write a shell script or some other program to extract the INODE_PATH and ACCESS_TIME for each of the INODE section and then sort them based on the ACCESS_TIME. You can also use Pig as shown here.