在火花流中的增量表中的upSert和删除
我对结构化流媒体有些新。如果您可以帮助我,那就太好了。提前致谢。
我有一个批处理文件(假设CSV),我们正在将其转换为每个记录的1个事件,并将其发送到Azure Event Hub(与Kafka主题相同)。我们正在阅读它,进行一些数据质量检查并存储到Delta表中。但是在将存储到Delta表中之前,我们需要根据一列进行UPSERT和删除,该列将状态显示为:更新,创建或删除。,并且基于我们需要基于密钥合并记录进入Delta表(我的意思是上升或删除记录)。您能告诉我在流媒体播放时做到这一点的最佳方法吗?
I am a bit new to structured streaming. If you can help me out, it would be great. Thanks in advance.
I have a batch file (suppose csv) which we are converting to 1 event per record and sending it to Azure event hub (same as Kafka topics). We are reading it, doing some data quality check and storing to delta table. But before storing into delta table we need to do upsert and delete based on a column which says the state as: updated, created or deleted. And based on that we need to merge the record based on a key into delta table (what I mean is to upsert or delete the records). Can you please tell me the best way to do it while streaming?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我也有类似的情况。使用铜牌中的新数据更新银表。我在Databricks论坛上开了讨论。
https://community.databricks.com/s/sfeed/s/feed/sfeed/sfeed/sfeed/0d58y00000096u4y4y4y4y4y4yymasas
I have a similar situation. Update the silver table with the new data present in the bronze table. I opened a discussion on the databricks forum.
https://community.databricks.com/s/feed/0D58Y000096U4yASAS