根据Kafka -Apache Flink中的路径读取S3文件
我有一个管道,可以听到一个接收S3文件名的Kafka主题&小路。管道必须从S3读取文件,并进行一些转换&聚合。 我看到Flink具有直接读取S3文件作为源连接器的支持,但是此用例是作为转换阶段的一部分读取。
I have a pipeline that listens to a Kafka topic that receives the s3 file-name & path. The pipeline has to read the file from S3 and do some transformation & aggregation.
I see the Flink has support to read the S3 file directly as source connector, but this use case is to read as part of the transformation stage.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不认为这是可能的。
另一种选择是保持flink会话集群运行,并在批处理模式下动态创建并提交新的Flink SQL作业以处理每个文件的摄入。
您可能会吸引您的另一种方法是实现一种接受路径作为输入,读取文件并逐一发射记录的RichFlatMapfunction。但是,除非文件很小,否则这可能不是很好,因为Flink确实不喜欢拥有长时间运行的用户功能。
I don't believe this is currently possible.
An alternative might be to keep a Flink session cluster running, and dynamically create and submit a new Flink SQL job running in batch mode to handle the ingestion of each file.
Another approach you might be tempted by would be to implement a RichFlatMapFunction that accepts the path as input, reads the file, and emits its records one by one. But this is likely to not work very well unless the files are rather small because Flink really doesn't like to have user functions that run for long periods of time.