apache flink org.apache.hadoop.ipc.rpcexception：RPC响应将字符串写入HDFS时超过最大数据长度

发布于 2025-02-13 12:48:00 字数 1113 浏览 0 评论 0原文

探索如何从Apache Flink中写入HDFS，我尝试了以下操作：

 val sink: StreamingFileSink[String] = StreamingFileSink
      .forRowFormat(new Path("hdfs://localhost:50070/mydata"), new SimpleStringEncoder[String]("UTF-8"))
      .withRollingPolicy(
        DefaultRollingPolicy.builder()
          .withRolloverInterval(Duration.ofMinutes(15))
          .withInactivityInterval(Duration.ofMinutes(5))
          .withMaxPartSize(1024 * 1024 * 1024)
          .build())
      .build()

我在本地运行一个与 docker-compose.yaml 。

但是写作完全失败了，org.apache.hadoop.ipc.rpcexception：rpc响应超过最大数据长度

i我验证了端口是正确的，并将以下添加到hadoop-hive.env < /code>（请参阅此处）：

HIVE_SITE_CONF_ipc_maximum_data_length=134217728

这是正确的定义通过环境变量增加数据长度的方法？还是问题在其他地方？

原文

Exploring how to write to an HDFS from apache flink I tried the following:

 val sink: StreamingFileSink[String] = StreamingFileSink
      .forRowFormat(new Path("hdfs://localhost:50070/mydata"), new SimpleStringEncoder[String]("UTF-8"))
      .withRollingPolicy(
        DefaultRollingPolicy.builder()
          .withRolloverInterval(Duration.ofMinutes(15))
          .withInactivityInterval(Duration.ofMinutes(5))
          .withMaxPartSize(1024 * 1024 * 1024)
          .build())
      .build()

I run a Hive cluster locally quite similar to this docker-compose.yaml.

But writing completely fails with org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length

I verified the port is the correct one and added the following to hadoop-hive.env (see here):

HIVE_SITE_CONF_ipc_maximum_data_length=134217728

Is this the correct way to define increase the data length through an environment variable? Or could the problem be somewhere else?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏天碎花小短裙 2025-02-20 12:48:00

查看Namenode使用的ENV文件

，如果您有此行，那就是

CORE_CONF_fs_defaultFS=hdfs://namenode:8020

如果从主机运行代码，您应该使用的Namenode地址，您可能会遇到类似的网络问题，因为客户端需要能够直接连接到该网络问题Datanodes，它无法轻易。唯一合理的解决方法是在同一Docker网络上的容器中运行代码

Look at the env file used by the namenode

If you have this line, then that's the namenode address you should be using

CORE_CONF_fs_defaultFS=hdfs://namenode:8020

If running code from the host, you might run into similar network issues because the client needs to be able to connect directly to the datanodes, which it won't easily be able to. The only reasonable workaround is to run your code in a container on the same Docker network

回复收藏 0 原文

~没有更多了~