Spark流与静态数据链球三分表的可靠性如何可靠
在databricks 有一个很酷的功能使用Delta表加入流数据框。很酷的部分是,对于随后的联接结果,增量表中的更改仍会反映出来。它可以正常工作,但是我很想知道这是如何工作的,这里有什么局限性?例如,预期更新延迟是什么?随着增量表的增长,它如何变化?在生产中依靠它是安全的吗?
In the databricks there is a cool feature that allows to join a streaming dataframe with a delta table. The cool part is that changes in the delta table are still reflected for a subsequent join results. It works just fine, but I'm curious to know how this works, and what are the limitations here? e.g. what's the expected update delay? How it changes as the delta table grows? Is it safe to rely on it in production?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,您可以依靠此功能(实际上是Spark) - 许多客户在生产中使用它。关于其他问题 - 这里有多个方面,取决于因素,例如,表更新的频率等:
但要完全完全回答,您需要提供更多特定于代码,用例等的信息。
Yes, you can rely on this feature (it's really of Spark) - many customers are using it in production. Regarding the other questions - there are multiple aspects here, depending on factors, like, how often table updates, etc.:
But really to answer it completely, you need to provide more information specific to your code, use case, etc.