在 Spark 结构化流中计算滑动窗口中的多个聚合
我有一个流源,它发送事件,其中每条记录由 3 个字段组成(CreationTime、FP、Detected) 这里,“FP”代表误报。 “FP”和“检测到”字段的值可以为 1 或 0。 我想通过滑动窗口计算以下值。 FPR1 = Count(FP) / Count(Detected) 和 FPR2 = Count(FP) / Count(窗口中的总记录)
我可以使用以下查询聚合 Count(FP)。我也想计算其他 2 个聚合。即 DetectedCount 和 TotalCount 并计算 FPR1 和 FPR2 并写入文件接收器。我该怎么做?提前致谢。
val aggDF = finaldata
.withWatermark("CreatedTime", "2 minute")
.groupBy(col("FP"),
window(col("CreatedTime"), "5 minute", "1 minute"))
.agg(sum("FP").alias("FPCount"))
I have a streaming source which sends events where every record consiste of 3 fields (CreationTime, FP, Detected)
Here, 'FP' stands for false positive. 'FP' and 'Detected' fields can have values 1 or 0.
I want to calculate the following values over a sliding window.
FPR1 = Count(FP) / Count(Detected) and FPR2 = Count(FP) / Count(Total records in window)
I am able to aggregate Count(FP) using following query. I want to count the other 2 aggregates as well. ie DetectedCount and TotalCount and calculate FPR1 and FPR2 and write to a file sink. How do I do this? Thanks in advance.
val aggDF = finaldata
.withWatermark("CreatedTime", "2 minute")
.groupBy(col("FP"),
window(col("CreatedTime"), "5 minute", "1 minute"))
.agg(sum("FP").alias("FPCount"))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
终于想通了。我错误地使用了 groupby 。这是最终的查询。
Figured it out finally. I was using groupby wrongly. here is the final query.