使用 Hadoop 最快的文件访问速度

发布于 2024-12-07 18:29:11 字数 114 浏览 0 评论 0原文

我需要以最快的速度访问单个文件,该文件的多个副本存储在使用 Hadoop 的许多系统中。我还需要以排序的方式查找每个文件的 ping 时间。 我应该如何学习hadoop来完成这个任务? 请快点帮忙,我的时间很少了。

I need fastest access to a single file, several copies of which are stored in many systems using Hadoop. I also need to finding the ping time for each file in a sorted manner.
How should I approach learning hadoop to accomplish this task?
Please help fast.I have very less time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

指尖上的星空 2024-12-14 18:29:11

如果您需要更快地访问文件,只需使用 setrep 命令。由于当前的硬件限制,这可能不会按比例增加文件吞吐量。

ls 命令没有给出访问时间目录和文件,仅显示修改时间。使用 离线图像查看器转储 hdfs fsimage 的内容文件转换为人类可读的格式。以下是使用缩进选项的命令。

bin/hdfs oiv -i fsimagedemo -p 缩进 -o fsimage.txt

中的示例 o/p,查找 ACCESS_TIME 列。

INODE
  INODE_PATH = /user/praveensripati/input/sample.txt
  REPLICATION = 1
  MODIFICATION_TIME = 2011-10-03 12:53
  ACCESS_TIME = 2011-10-03 16:26
  BLOCK_SIZE = 67108864
  BLOCKS [NUM_BLOCKS = 1]
    BLOCK
      BLOCK_ID = -5226219854944388285
      NUM_BYTES = 529
      GENERATION_STAMP = 1005
  NS_QUOTA = -1
  DS_QUOTA = -1
  PERMISSIONS
    USER_NAME = praveensripati
    GROUP_NAME = supergroup
    PERMISSION_STRING = rw-r--r--

要以排序的方式获取 ping 时间,您需要编写 shell 脚本或其他程序来提取每个 INODE 部分的 INODE_PATH 和 ACCESS_TIME,然后根据 ACCESS_TIME 对它们进行排序。您还可以使用 Pig,如下所示 这里

我应该如何学习hadoop来完成这个任务?请快点帮忙。我的时间很少了。

如果你想在一两天内学会 Hadoop 是不可能的。 这里是一些入门视频和文章。

If you need faster access to a file just increase the replication factor to that file using setrep command. This might not increase the file throughput proportionally, because of your current hardware limitations.

The ls command is not giving the access time for the directories and the files, it's showing the modification time only. Use the Offline Image Viewer to dump the contents of hdfs fsimage files to human-readable formats. Below is the command using the Indented option.

bin/hdfs oiv -i fsimagedemo -p Indented -o fsimage.txt

A sample o/p from the fsimage.txt, look for the ACCESS_TIME column.

INODE
  INODE_PATH = /user/praveensripati/input/sample.txt
  REPLICATION = 1
  MODIFICATION_TIME = 2011-10-03 12:53
  ACCESS_TIME = 2011-10-03 16:26
  BLOCK_SIZE = 67108864
  BLOCKS [NUM_BLOCKS = 1]
    BLOCK
      BLOCK_ID = -5226219854944388285
      NUM_BYTES = 529
      GENERATION_STAMP = 1005
  NS_QUOTA = -1
  DS_QUOTA = -1
  PERMISSIONS
    USER_NAME = praveensripati
    GROUP_NAME = supergroup
    PERMISSION_STRING = rw-r--r--

To get the ping time in a sorted manner, you need to write a shell script or some other program to extract the INODE_PATH and ACCESS_TIME for each of the INODE section and then sort them based on the ACCESS_TIME. You can also use Pig as shown here.

How should I approach learning hadoop to accomplish this task? Please help fast.I have very less time.

If you want to learn Hadoop in a day or two it's not possible. Here are some videos and articles to start with.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文