等待作业中的转换
我正在使用 Pentaho Data Integration(又名 Kettle),我有几个转换,我们称它们为 A、B、C、D、E。 B 依赖于 A,D 依赖于 C,E 依赖于 B 和 D。在一项工作中,我想并行运行 A、B 和 C、D:
-> A -> B _
Start< \
-> C -> D----> E
其中 A 和 C 并行运行。有没有办法仅在 B AND D 成功的情况下执行 E?现在,查看作业指标,一旦 B OR D 完成,E 就会被执行。
I am working with Pentaho Data Integration (aka Kettle) and I have several Transformations, let's call them A, B, C, D, E.
B depends on A, D depends on C and E depends on B and D. In a job I'd like to run A, B and C, D in parallel:
-> A -> B _
Start< \
-> C -> D----> E
where A and C run in parallel. Is there any way to execute E only iff B AND D were successful? Right now, looking at the Job metrics, E gets executed as soon as either B OR D are finished.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我刚刚发现 http://forums.pentaho.org/showthread.php?t=75425 看来要实现我想要的目标并不容易。
I just found http://forums.pentaho.org/showthread.php?t=75425 and it seems like it's not easily possible to achieve what I want.
您可以执行以下操作:
JS 步骤:
写入日志步骤是可选的,我过去常常不通过使用日志消息在日志中注册红线错误:
所以我可以通过日志看到每个步骤结束的内容和时间。
You can do something like this:
The JS step with:
The write to log step is optional and I used to not register in log a red-lined error, by using Log Message:
So I am able to see what and when each step ends through log.
我相信这是可以做到的,但我没有足够大的工作来真正很好地测试这一点,这很尴尬。基本上,除了 A、B、C、D 和 E 工作之外,您还需要 4 个单独的工作。我们将它们称为控制作业、作业 A_B、作业 C_D 和并行作业。
你可以这样设置它们:
关键是 A -> B和C-> D 需要在自己的工作步骤中保留依赖性。然后,并行作业确保两条并行路径均已完成,然后才允许控制继续进行到 E。
I believe this can be done, but I don't have jobs big enough to really test this well, and it's awkward. Basically, you'll need 4 separate jobs in addition to your A,B,C,D, and E jobs. Let's call them Control Job, Job A_B, Job C_D, and Parallel Jobs.
You set them up like this:
The key is that A -> B and C -> D need to be in their own job step to retain the dependency. Then Parallel Jobs makes sure both parallel paths have completed before allowing control to proceed to E.
我从里卡多的答案开始,但发现如果两个转换同时完成,则工作将通过两个独立的流继续。
我通过计算 javascript 步骤已达到的次数来解决这个问题:
tables_complete 不需要事先定义。
I started with ricardo's answer, but found that if two of the transformations finish at the exact same time, the job will continue with two independent streams.
I got around this by instead counting the number of times the javascript step had been reached:
tables_complete doesn't need to be defined beforehand.
您需要使用 Jobception 将作业放入作业中才能完成这项工作。在您的示例中,我将调用一个(仅)执行 A 和 B 的作业以及另一个(仅)执行 C 和 D 的作业。
然后我会放置一个调用这两个作业的作业。
整个事情看起来像这样:
这就是解决这个问题的方法。
You need to use Jobception to put jobs in a job in order to make this work. In your example, I would call a job that executes both A and B (only) and another job that executues both C and D (only).
I would then put a job that calls both of those jobs.
Whole thing would look like this:
That is how you would solve this problem.