apache flink org.apache.hadoop.ipc.rpcexception:RPC响应将字符串写入HDFS时超过最大数据长度
探索如何从Apache Flink中写入HDFS,我尝试了以下操作:
val sink: StreamingFileSink[String] = StreamingFileSink
.forRowFormat(new Path("hdfs://localhost:50070/mydata"), new SimpleStringEncoder[String]("UTF-8"))
.withRollingPolicy(
DefaultRollingPolicy.builder()
.withRolloverInterval(Duration.ofMinutes(15))
.withInactivityInterval(Duration.ofMinutes(5))
.withMaxPartSize(1024 * 1024 * 1024)
.build())
.build()
我在本地运行一个与 docker-compose.yaml 。
但是写作完全失败了,org.apache.hadoop.ipc.rpcexception:rpc响应超过最大数据长度
i我验证了端口是正确的,并将以下添加到hadoop-hive.env < /code>(请参阅此处):
HIVE_SITE_CONF_ipc_maximum_data_length=134217728
这是正确的定义通过环境变量增加数据长度的方法?还是问题在其他地方?
Exploring how to write to an HDFS from apache flink I tried the following:
val sink: StreamingFileSink[String] = StreamingFileSink
.forRowFormat(new Path("hdfs://localhost:50070/mydata"), new SimpleStringEncoder[String]("UTF-8"))
.withRollingPolicy(
DefaultRollingPolicy.builder()
.withRolloverInterval(Duration.ofMinutes(15))
.withInactivityInterval(Duration.ofMinutes(5))
.withMaxPartSize(1024 * 1024 * 1024)
.build())
.build()
I run a Hive cluster locally quite similar to this docker-compose.yaml.
But writing completely fails with org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length
I verified the port is the correct one and added the following to hadoop-hive.env
(see here):
HIVE_SITE_CONF_ipc_maximum_data_length=134217728
Is this the correct way to define increase the data length through an environment variable? Or could the problem be somewhere else?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
查看Namenode使用的ENV文件
,如果您有此行,那就是
如果从主机运行代码,您应该使用的Namenode地址,您可能会遇到类似的网络问题,因为客户端需要能够直接连接到该网络问题Datanodes,它无法轻易。唯一合理的解决方法是在同一Docker网络上的容器中运行代码
Look at the env file used by the namenode
If you have this line, then that's the namenode address you should be using
If running code from the host, you might run into similar network issues because the client needs to be able to connect directly to the datanodes, which it won't easily be able to. The only reasonable workaround is to run your code in a container on the same Docker network