AggregateFunction 与 SessionWindow - 了解合并的工作原理

发布于 2025-01-10 05:53:13 字数 1545 浏览 3 评论 0原文

在使用 EventTimeSessionWindows 在 Flink 中实现 AggregateFunction 时，我无法理解在 SessionWindow 具有动态间隙的情况下何时发生合并。

代码片段：

SingleOutputStreamOperator<Tuple1<String>> aggregateData = parsedData.assignTimestampsAndWatermarks(WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofMinutes(20)))
.keyBy(new ZeusRawKeyByFunction())
.window(EventTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor<ZeusEvent>() {
                    @Override
                    public long extract(ZeusEvent event) {
                        if (event.getEventTypeName().equals("PlaybackSessionClosed")) {
                            return 100;
                        } else {
                            return Time.minutes(30).toMilliseconds();
                        }
                    }
                }))
                .allowedLateness(Time.minutes(10))
                .trigger(ContinuousEventTimeTrigger.of(Time.minutes(1)))
                .sideOutputLateData(lateEvents)
                .aggregate(new ZeusAggregateFunction())
                .setParallelism(parameterTool.getInt("zeus-aggregator-parallelism"))
                .name("Zeus Aggregator")

我在聚合器中定义了四个函数：

createAccumulator：这将创建一个新的累加器
add：这将继续将 1 分钟触发时间内的所有新事件添加到累加器
getResult：这将获取要写入接收器的最后一行那个触发器
merge：这何时起作用？每个触发器都会发生合并吗？

我试图了解合并是否会每分钟发生一次触发器，并且创建一个新的累加器并与前一个累加器合并。

原文

While implementing the AggregateFunction in Flink with EventTimeSessionWindows, I am not able to understand when the merge happens in case of a SessionWindow with dynamic gap.

Code snippet:

SingleOutputStreamOperator<Tuple1<String>> aggregateData = parsedData.assignTimestampsAndWatermarks(WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofMinutes(20)))
.keyBy(new ZeusRawKeyByFunction())
.window(EventTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor<ZeusEvent>() {
                    @Override
                    public long extract(ZeusEvent event) {
                        if (event.getEventTypeName().equals("PlaybackSessionClosed")) {
                            return 100;
                        } else {
                            return Time.minutes(30).toMilliseconds();
                        }
                    }
                }))
                .allowedLateness(Time.minutes(10))
                .trigger(ContinuousEventTimeTrigger.of(Time.minutes(1)))
                .sideOutputLateData(lateEvents)
                .aggregate(new ZeusAggregateFunction())
                .setParallelism(parameterTool.getInt("zeus-aggregator-parallelism"))
                .name("Zeus Aggregator")

I have defined four functions in the aggregator:

createAccumulator: This creates a new accumulator
add: This will keep on adding all the new events in the 1 min trigger time to the accumulator
getResult: This will get the final row to write to sink for that trigger
merge: When does this work ? Does the merge happen for every trigger?

I am trying to understand if the merge will happen every min with the trigger and a new accumulator gets created and gets merged with the previous one.

分享到QQ

分享到微博