python SDK代码示例Apache Beam中可分布的DOFNS
我正在用Python创建一个数据流管线,因为我想访问并跟踪处理的文件名。
一切正常,直到我的文件很小。在大型文件(GB的数据)上运行时,数据流作业不是执行的或可扩展的。
我有一个使用可分解的帕多族(DOFN)的解决方案,该解决方案我先前使用Java实施,但是现在首选的语言是Python。
问题在于我找不到任何像样的代码段(示例),这些代码段将解释如何在Python SDK中实现可分布的Pardo(DOFNS)。
我在Beam文档中找到了一个代码示例 https://beam.apache.org.org.org.org.org.org /blog/abletable-do-fn/,但就我尝试的而言,这并不正确。
class CountFn(DoFn):
def process(element, tracker=DoFn.RestrictionTrackerParam)
for i in xrange(*tracker.current_restriction()):
if not tracker.try_claim(i):
return
yield element[0], i
def get_initial_restriction(element):
return (0, element[1])
在此示例中,我们将Tracker = dofn.grestrictionaltrackerparam在过程方法中,但是正如我所看到的,DOFN类没有任何参数限制性TrackerParam。
就我测试而言,此示例还不完整。
我可以在python SDK中获得一个可分布的DOFN的体面示例来获得一些帮助。
I am creating a dataflow pipeline in python in which i need to use FileIO because i want to access and keep track of the filenames processed.
Everything is working fine ,till my files are of small sizes. When running on a large files(GBS of data) , the dataflow job is not performant or scalable.
I have a solution of using splittable Pardos(dofns) which I have earlier implemented using java, but now the preferred language is python.
The problem is that I am unable to find any decent code snippets (example) which will explain how to implement a splittable pardo (dofns) in python sdk.
I have found a code example in the beam documentation https://beam.apache.org/blog/splittable-do-fn/ , but it's not correct as far as I have tried it.
class CountFn(DoFn):
def process(element, tracker=DoFn.RestrictionTrackerParam)
for i in xrange(*tracker.current_restriction()):
if not tracker.try_claim(i):
return
yield element[0], i
def get_initial_restriction(element):
return (0, element[1])
In this example , we are passing tracker=DoFn.RestrictionTrackerParam in the process method, but as I can see DoFn class does not have any parameter RestrictionTrackerParam.
As far as I have tested this example is not complete.
Can i get some help on getting a decent example of splittable dofns used in python sdk.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论