CloudWatch 中的 KDA 指标与 Flink 指标不同
我在 AWS Kinesis Data Analytics 上部署了一个 Flink 应用程序。 我当前的设置是:
Parallelism=128
Parallelism per KPU=4
我遇到的问题是,Flink Web UI 上显示的计数与 Cloudwatch 中显示的计数之间存在很大差异,即使对于开箱即用的指标也是如此。
示例:
来自 Flink UI 的计数:
来自 Cloudwatch 的计数:
NumRecordsIn:
这两个数字都接近 1080 万。
KDA 的指标在任务
级别配置。
我想知道为什么我会看到如此巨大的差异。并行性对计数有影响吗?
FWIW,我添加了一个跟踪 numRecordsIn 的自定义指标。这似乎也类似于开箱即用的 NumRecordsIn 指标。
I have a Flink application deployed on AWS Kinesis Data Analytics.
My current setting is:
Parallelism=128
Parallelism per KPU=4
The issue I have is, there is a big difference between the counts shown on the Flink web UI vs the Count shown in Cloudwatch even for Metrics that come out of the box.
Example:
Counts from Flink UI:
Records Sent: Close to 1 Billion
Count from Cloudwatch:
NumRecordsIn:
Both of these are close to 10.8Million.
The metrics for KDA are configured at Task
level.
I am wondering why I am seeing this huge discrepancy. Is the parallelism having some effect on the counts?
FWIW, I added a custom metric which tracks the numRecordsIn. That also seems to be similar to the out of the box NumRecordsIn metric.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您似乎正在将作业生命周期内的总 numRecordsIn/Out(大约 10 亿)与一分钟内出现的最大值(大约 1000 万)进行比较。
It appears that you are comparing the total numRecordsIn/Out across the lifetime of the job (roughly 1 billion) to the maximum ever seen in one minute (around 10 million).
大卫上面提到的都是正确的。我终于找到了如何将 Flink UI 中的值获取到 CloudWatch。 CloudWatch 的问题在于,它不包括并行性。因此,为了获取特定任务发出的记录数,请获取该任务的 Average(numRecordsOut) 并将其乘以并行度。
What David has mentioned above is correct. I was finally able to find out how to get the values from Flink UI on to CloudWatch. The problem with CloudWatch is that, it does not include the parallelism. So in order to get the count of records emitted by a specific task, take the Average(numRecordsOut) of that task and multiply it by parallelism.