Spark numoutputrows 指标显示 -1
我正在使用带有 kafka 输出接收器的结构化流。
我使用 SinkProgess.numoutputrows 指标记录写入 kafka 的记录:
https://spark. apache.org/docs/latest/api/java/org/apache/spark/sql/streaming/SinkProgress.html
结果是大多数时候 numoutputrows 报告正确的数字。然而有时我会看到一堆-1。正因为如此,这个指标本身是完全没有用的。
现在他们在文档中指出:
numOutputRows Number of rows written to the sink or -1 for Continuous Mode (temporarily) or Sink V1 (until decommissioned).
这意味着什么?我使用微批次,但不使用连续模式。 他们所说的暂时是什么意思?什么是接收器 V1?他们所说的“直到退役”是什么意思。
完全不清楚。
编辑:
- 简单地忽略 -1 并不是一个解决方案,因为即使如此,数字也不会相加。
- 如果我启动应用程序并且没有出现 -1,那么最终结果 100% 正确。
- 看来这些 -1 吞下了正确的数字
I am using structured streaming with kafka output sink.
I log the records written to kafka by using SinkProgess.numoutputrows metric:
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/streaming/SinkProgress.html
The result is that most of the time the numoutputrows reports the correct numbers. However sometimes I see a bunch of -1. And because of this the metric itself is totally useless.
Now in the doc they states:
numOutputRows Number of rows written to the sink or -1 for Continuous Mode (temporarily) or Sink V1 (until decommissioned).
What this means? I use microbatches but not Continuous Mode.
What they mean by temporarily? What is Sink V1? What they mean by 'until decomissioned'.
Totally unclear.
edit:
- And it is not a solution to simply ignore the -1s because even then the numbers doesn't add up.
- If I start the app and no -1 show up then the final result is correct 100% of the time.
- Seems these -1s swallow the correct numbers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来这是预期的行为。参考:https://issues.apache.org/jira/browse/SPARK-33359?focusedCommentId=17227029&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17227029
Looks like this is an expected behaviour. Reference : https://issues.apache.org/jira/browse/SPARK-33359?focusedCommentId=17227029&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17227029