当打印表结果时,pyflink的OP列是什么意思
当我使用pyflink SQL进行加入查询并打印结果时,有一些重复的行,其中OP列显示如“附件”屏幕截图,任何了解那是什么,我该如何产生非删除结果?提前致谢。 screenshot
When I do a join query using pyflink sql and print the result, there are some duplicate rows where a op column is displayed as in attached screenshot, any idea what that is and how can I produce non-duplicate result? Thanks in advance.
screenshot
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
显然,您已经完成了流媒体连接(而不是批处理联接),其中结果是更新(或更改)流。 +i是一个插入物,-u是回缩, +u是一个更新。 +D将是删除。
通过流式连接,Flink SQL将在处理新输入时不断更新结果。您看到的是如何用printSink表示。如果您在批处理执行模式下运行相同的查询,则只能打印出最终结果。这是获得想要的东西的一种方法。另一个选项是使用可以处理流式台词的水槽,例如JDBC水槽。
You've apparently done a streaming join (rather than a batch join), where the result is an updating (or changelog) stream. +I is an insert, -U is a retraction, and +U is an update. +D would be a deletion.
With a streaming join, Flink SQL will continuously update the result as new inputs are processed. What you are seeing is how this is represented by the PrintSink. If you run this same query in batch execution mode, then only the final result will be printed out. That's one way to get what you want. Another option would be to use a sink that can handle streaming upserts, such as the JDBC sink.