使用列表或 pcollection 之间的区别
我在 apache beam 中构建了一个管道,我只是对此感到好奇,将 ptransform 应用于列表和 pcollection 之间有什么区别,性能是否受此影响,或者只是 pcollection 是不可变的,这是一种不好的方法吗?使用 apache beam 接近管道?
Im building a pipeline in apache beam and I just got curious about this, whats the difference between applying a ptransform to a list and a pcollection, is the performance affected by this or is just that the pcollection is inmutable and is this a bad way to aproach a pipeline with apache beam?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据定义,PCollection 是无界集合。不可变且无界。
与列表的主要区别主要在于无界特性,并且当您流式传输数据(来自大文件或来自无界源,如 PubSub)时,它尤其强大。
By definition, a PCollection is a unbounded collection. Immutable, and unbounded.
The main difference with a list is mainly the unbounded characteristic and it's especially powerful when you are streaming data (from a large file, or from a unbounded source, like PubSub).