等待作业中的转换

发布于 2024-08-29 22:16:32 字数 360 浏览 13 评论 0原文

我正在使用 Pentaho Data Integration(又名 Kettle),我有几个转换,我们称它们为 A、B、C、D、E。 B 依赖于 A,D 依赖于 C,E 依赖于 B 和 D。在一项工作中,我想并行运行 A、B 和 C、D:

           -> A -> B _
    Start<            \
           -> C -> D----> E

其中 A 和 C 并行运行。有没有办法仅在 B AND D 成功的情况下执行 E?现在,查看作业指标,一旦 B OR D 完成,E 就会被执行。

I am working with Pentaho Data Integration (aka Kettle) and I have several Transformations, let's call them A, B, C, D, E.
B depends on A, D depends on C and E depends on B and D. In a job I'd like to run A, B and C, D in parallel:

           -> A -> B _
    Start<            \
           -> C -> D----> E

where A and C run in parallel. Is there any way to execute E only iff B AND D were successful? Right now, looking at the Job metrics, E gets executed as soon as either B OR D are finished.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

枯寂 2024-09-05 22:16:32

我刚刚发现 http://forums.pentaho.org/showthread.php?t=75425 看来要实现我想要的目标并不容易。

I just found http://forums.pentaho.org/showthread.php?t=75425 and it seems like it's not easily possible to achieve what I want.

桃酥萝莉 2024-09-05 22:16:32

您可以执行以下操作:

        /--=--[job]----[set var J1=1]---\ 
[start]----=--[Job]----[set var J2=1]----+--[jscriptstep]--(ok)-->[next steps]
        \--=--[Job]----[set var J3=1]---/        \
                                                 (x)
                                                   \
                                                  [Write to log]

JS 步骤:

J1= parent_job.getVariable("J1");
J2= parent_job.getVariable("J2");
J3= parent_job.getVariable("J3");
(J1*J2*J3)==1;

写入日志步骤是可选的,我过去常常不通过使用日志消息在日志中注册红线错误:

“等待:${J1}-${J2}-${J3}-${J4}-${J5}”

所以我可以通过日志看到每个步骤结束的内容和时间。

You can do something like this:

        /--=--[job]----[set var J1=1]---\ 
[start]----=--[Job]----[set var J2=1]----+--[jscriptstep]--(ok)-->[next steps]
        \--=--[Job]----[set var J3=1]---/        \
                                                 (x)
                                                   \
                                                  [Write to log]

The JS step with:

J1= parent_job.getVariable("J1");
J2= parent_job.getVariable("J2");
J3= parent_job.getVariable("J3");
(J1*J2*J3)==1;

The write to log step is optional and I used to not register in log a red-lined error, by using Log Message:

" Waiting :${J1}-${J2}-${J3}-${J4}-${J5} "

So I am able to see what and when each step ends through log.

余厌 2024-09-05 22:16:32

我相信这是可以做到的,但我没有足够大的工作来真正很好地测试这一点,这很尴尬。基本上,除了 A、B、C、D 和 E 工作之外,您还需要 4 个单独的工作。我们将它们称为控制作业、作业 A_B、作业 C_D 和并行作业。

你可以这样设置它们:

Control Job: start -> Parallel Jobs -> E
Parallel Jobs:       -> Job A_B
               start<           (Set Start step to run next jobs in parallel)
                     -> Job C_D
Job A_B: start -> A -> B
Job C_D: start -> C -> D

关键是 A -> B和C-> D 需要在自己的工作步骤中保留依赖性。然后,并行作业确保两条并行路径均已完成,然后才允许控制继续进行到 E。

I believe this can be done, but I don't have jobs big enough to really test this well, and it's awkward. Basically, you'll need 4 separate jobs in addition to your A,B,C,D, and E jobs. Let's call them Control Job, Job A_B, Job C_D, and Parallel Jobs.

You set them up like this:

Control Job: start -> Parallel Jobs -> E
Parallel Jobs:       -> Job A_B
               start<           (Set Start step to run next jobs in parallel)
                     -> Job C_D
Job A_B: start -> A -> B
Job C_D: start -> C -> D

The key is that A -> B and C -> D need to be in their own job step to retain the dependency. Then Parallel Jobs makes sure both parallel paths have completed before allowing control to proceed to E.

你是暖光i 2024-09-05 22:16:32

我从里卡多的答案开始,但发现如果两个转换同时完成,则工作将通过两个独立的流继续。

我通过计算 javascript 步骤已达到的次数来解决这个问题:

cnt= parent_job.getVariable("tables_complete");
cnt++;
parent_job.setVariable("tables_complete",cnt);
3 == cnt;

tables_complete 不需要事先定义。

I started with ricardo's answer, but found that if two of the transformations finish at the exact same time, the job will continue with two independent streams.

I got around this by instead counting the number of times the javascript step had been reached:

cnt= parent_job.getVariable("tables_complete");
cnt++;
parent_job.setVariable("tables_complete",cnt);
3 == cnt;

tables_complete doesn't need to be defined beforehand.

恋竹姑娘 2024-09-05 22:16:32

您需要使用 Jobception 将作业放入作业中才能完成这项工作。在您的示例中,我将调用一个(仅)执行 A 和 B 的作业以及另一个(仅)执行 C 和 D 的作业。

然后我会放置一个调用这两个作业的作业。

整个事情看起来像这样:

Job 1 - Calls Job A and Job B ONLY
Job 2 - Calls Job C and Job D ONLY
Job 3 - Calls Job 1 and Job 2 and points to Job E.

这就是解决这个问题的方法。

You need to use Jobception to put jobs in a job in order to make this work. In your example, I would call a job that executes both A and B (only) and another job that executues both C and D (only).

I would then put a job that calls both of those jobs.

Whole thing would look like this:

Job 1 - Calls Job A and Job B ONLY
Job 2 - Calls Job C and Job D ONLY
Job 3 - Calls Job 1 and Job 2 and points to Job E.

That is how you would solve this problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文