使用 Hadoop 最快的文件访问速度
我需要以最快的速度访问单个文件,该文件的多个副本存储在使用 Hadoop 的许多系统中。我还需要以排序的方式查找每个文件的 ping 时间。 我应该如何学习hadoop来完成这个任务? 请快点帮忙,我的时间很少了。
I need fastest access to a single file, several copies of which are stored in many systems using Hadoop. I also need to finding the ping time for each file in a sorted manner.
How should I approach learning hadoop to accomplish this task?
Please help fast.I have very less time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您需要更快地访问文件,只需使用 setrep 命令。由于当前的硬件限制,这可能不会按比例增加文件吞吐量。
ls 命令没有给出访问时间目录和文件,仅显示修改时间。使用 离线图像查看器转储 hdfs fsimage 的内容文件转换为人类可读的格式。以下是使用缩进选项的命令。
中的示例 o/p,查找 ACCESS_TIME 列。
要以排序的方式获取 ping 时间,您需要编写 shell 脚本或其他程序来提取每个 INODE 部分的 INODE_PATH 和 ACCESS_TIME,然后根据 ACCESS_TIME 对它们进行排序。您还可以使用 Pig,如下所示 这里。
如果你想在一两天内学会 Hadoop 是不可能的。 这里是一些入门视频和文章。
If you need faster access to a file just increase the replication factor to that file using setrep command. This might not increase the file throughput proportionally, because of your current hardware limitations.
The ls command is not giving the access time for the directories and the files, it's showing the modification time only. Use the Offline Image Viewer to dump the contents of hdfs fsimage files to human-readable formats. Below is the command using the Indented option.
A sample o/p from the fsimage.txt, look for the ACCESS_TIME column.
To get the ping time in a sorted manner, you need to write a shell script or some other program to extract the INODE_PATH and ACCESS_TIME for each of the INODE section and then sort them based on the ACCESS_TIME. You can also use Pig as shown here.
If you want to learn Hadoop in a day or two it's not possible. Here are some videos and articles to start with.