HDFS 表示文件仍然打开，但写入该文件的进程已被终止

发布于 2024-10-22 08:12:51 字数 1484 浏览 1 评论 0原文

我是 hadoop 的新手，过去几个小时我一直在尝试用 google 搜索这个问题，但我找不到任何有帮助的东西。我的问题是 HDFS 说该文件仍然打开，即使写入该文件的进程早已死亡。这使得无法从文件中读取。

我对目录运行了 fsck，它报告一切正常。但是，当我运行“hadoop fsck -fs hdfs：//hadoop /logs/raw/directory_having_file -openforwrite”时，我得到

Status: CORRUPT
 Total size:    222506775716 B
 Total dirs:    0
 Total files:   630
 Total blocks (validated):  3642 (avg. block size 61094666 B)
  ********************************
  CORRUPT FILES:    1
  MISSING BLOCKS:   1
  MISSING SIZE:     30366208 B
  ********************************
 Minimally replicated blocks:   3641 (99.97254 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    2
 Average block replication: 2.9991763
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      23
 Number of racks:       1

在 openforwrite 的文件上再次执行 fsck 命令，我得到

.Status: HEALTHY
 Total size:    793208051 B
 Total dirs:    0
 Total files:   1
 Total blocks (validated):  12 (avg. block size 66100670 B)
 Minimally replicated blocks:   12 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    2
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      23
 Number of racks:       1

有没有人知道发生了什么以及如何我可以修复吗？

原文

I'm new to hadoop and I've spent the past couple hours trying to google this issue, but I couldn't find anything that helped. My problem is HDFS says the file is still open, even though the process writing to it is long dead. This makes it impossible to read from the file.

I ran fsck on the directory and it reports everything is healthy. However when I run "hadoop fsck -fs hdfs://hadoop /logs/raw/directory_containing_file -openforwrite" I get

Status: CORRUPT
 Total size:    222506775716 B
 Total dirs:    0
 Total files:   630
 Total blocks (validated):  3642 (avg. block size 61094666 B)
  ********************************
  CORRUPT FILES:    1
  MISSING BLOCKS:   1
  MISSING SIZE:     30366208 B
  ********************************
 Minimally replicated blocks:   3641 (99.97254 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    2
 Average block replication: 2.9991763
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      23
 Number of racks:       1

Doing the fsck command again on the file that is openforwrite I get

.Status: HEALTHY
 Total size:    793208051 B
 Total dirs:    0
 Total files:   1
 Total blocks (validated):  12 (avg. block size 66100670 B)
 Minimally replicated blocks:   12 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    2
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      23
 Number of racks:       1

Does anyone have any ideas what is going on and how I can fix it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

紫瑟鸿黎 2024-10-29 08:12:51

我发现这些块似乎丢失了，因为名称节点服务器暂时不可用，从而损坏了该文件的文件系统。看来文件中没有丢失块的部分仍然可以读取/复制。有关处理 hdfs 损坏的更多信息，请访问 https://twiki.grid .iu.edu/bin/view/Storage/HadoopRecovery（镜像：http://www.webitation. org/5xMTitU0r）

编辑：这个问题似乎是由于 Scribe（或更具体地说 Scribe 使用的 DFSClient）在尝试写入 HDFS 时挂起的问题造成的。我们使用 HADOOP-6099 和 HDFS-278 手动修补 hadoop 集群的源代码，重建二进制文件并使用新版本重新启动集群。我们运行新版本的两个月里没有出现任何问题。

回复收藏 0 原文

~没有更多了~