Pentaho PDI:CSV每行的执行转换?

发布于 2025-01-18 13:08:00 字数 716 浏览 3 评论 0原文

这是我们试图做的蒸馏版。转换步骤是一个“表输入”:

SELECT DISTINCT ${SRCFIELD} FROM ${SRCTABLE}

我们想运行来自CSV中每一行的变量/参数的SQL:

SRCFIELD,SRCTABLE
carols_key,carols_table
mikes_ix,mikes_rec
their_field,their_table

在这种情况下,我们希望它运行三次转换,其中一个用于每个数据行CSV,从这些表中的这些字段中汲取唯一值。我希望有一种简单的方法可以做到这一点。

我认为唯一的困难是,我们没有偶然发现了正确的步骤/条目和正确的设置。

在“父”转换中四处张开,我们的最高希望是:

  1. 我们尝试将csv文件输入链接到设置变量(希望将其馈送到转换epotutor一次一行),但是当我们从CSV中有多个行时,这会抓住。
  2. 我们尝试了管道CSV文件输入直接到转换epecutor,但这仅将TE的“静态输入值”发送到子转换。

我们还使用作业探索了转换对象,我们非常希望偶然发现“ 执行每个输入行”的应用程序,但尚未弄清楚如何一次将数据输送到一排。

建议?

Here's a distilled version of what we're trying to do. The transformation step is a "Table Input":

SELECT DISTINCT ${SRCFIELD} FROM ${SRCTABLE}

We want to run that SQL with variables/parameters set from each line in our CSV:

SRCFIELD,SRCTABLE
carols_key,carols_table
mikes_ix,mikes_rec
their_field,their_table

In this case we'd want it to run the transformation three times, one for each data line in the CSV, to pull unique values from those fields in those tables. I'm hoping there's a simple way to do this.

I think the only difficulty is, we haven't stumbled across the right step/entry and the right settings.

Poking around in a "parent" transformation, the highest hopes we had were:

  1. We tried chaining CSV file input to Set Variables (hoping to feed it to Transformation Executor one line at a time) but that gripes when we have more than one line from the CSV.
  2. We tried piping CSV file input directly to Transformation Executor but that only sends TE's "static input value" to the sub-transformation.

We also explored using a job, with a Transformation object, we were very hopeful to stumble into what the "Execute every input row" applied to, but haven't figured out how to pipe data to it one row at a time.

Suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

末骤雨初歇 2025-01-25 13:08:00

啊!

为此,我们必须使用两个转换来创建一个工作。第一个从CSV读取“参数”,第二行对于第一个CSV数据执行了一次职责。

在作业中,第一个转换是这样设置的:

  1. options/logging/cognuments/parameters选项卡都将其留作default

  2. 转换本身中的默认值

    (右键单击,打开引用的对象 - >转换):

    • 步骤1:CSV文件输入
    • step2:将行复制到结果< ==那是魔术部分

step2:将行复制到结果< ==这是工作中的 ,第二个转换的设置如下:

  1. 选项:“执行每个输入行”

  2. 记录/参数选项卡作为默认值

  3. 参数:

    • 将结果复制到参数,检查
    • 将参数值传递给子转换,检查
    • 参数:srcfield;使用的参数:srcfield
    • 参数:srctable;使用的参数:srctable
  4. 转换本身中的srctable

    (右键单击,打开引用的对象 - >转换):

    • 表输入“ 从$ {srctable}”选择不同的$ {srcfield}代码
    • 注意:必须检查“替换变量”

因此第一个转换将来自CSV的“配置”数据收集,并在一次时间录制中将这些值传递给第二个转换(因为检查了“执行每个输入行”)。

因此,现在有了这样的CSV:

SRCTABLE,SRCFIELD
person_rec,country
person_rec,sex
application_rec,major1
application_rec,conc1
status_rec,cur_stat

我们可以为所有这些特定字段提取不同的值,以及更多。而且很容易维护哪些表以及检查哪些字段。

将此想法扩展到数据流中,其中第二个转换更新数据amart中的代码字段并不是一件很大的范围:

SRCTABLE,SRCFIELD,TARGETTABLE,TARGETFIELD
person_rec,country,dim_country,country_code
person_rec,sex,dim_sex,sex_code
application_rec,major1,dim_major,major_code
application_rec,conc1,dim_concentration,concentration_code
status_rec,cur_stat,dim_current_status,cur_stat_code

我们需要拉动唯一的$ {targetTable}。$ {target field}值,请使用a <<代码>合并行(diff)步骤,使用filter行步骤仅查找'新'',然后a 执行SQL Script步骤以更新更新目标。

令人兴奋!

Aha!

To do this, we must create a JOB with TWO TRANSFORMATIONS. The first reads "parameters" from the CSV and the second does its duty once for each row of CSV data from the first.

PDI Job with two Transformations

In the JOB, the first transformation is set up like this:

  1. Options/Logging/Arguments/Parameters tabs are all left as default

  2. In the transformation itself (right click, open referenced object->transformation):

    • Step1: CSV file input
    • Step2: Copy rows to result <== that's the magic part

Back in the JOB, the second transformation is set up like so:

  1. Options: "Execute every input row" is checked

  2. Logging/Arguments tabs are left as default

  3. Parameters:

    • Copy results to parameters, is checked
    • Pass parameter values to sub transformation, is checked
    • Parameter: SRCFIELD; Parameter to use: SRCFIELD
    • Parameter: SRCTABLE; Parameter to use: SRCTABLE
  4. In the transformation itself (right click, open referenced object->transformation):

    • Table input "SELECT DISTINCT ${SRCFIELD} code FROM ${SRCTABLE}"
    • Note: "Replace variables in script" must be checked

So the first transformation gathers the "config" data from the CSV and, one-record-at-a-time, passes those values to the second transformation (since "Execute every input row" is checked).

So now with a CSV like this:

SRCTABLE,SRCFIELD
person_rec,country
person_rec,sex
application_rec,major1
application_rec,conc1
status_rec,cur_stat

We can pull distinct values for all those specific fields, and lots more. And it's easy to maintain which tables and which fields are examined.

Expanding this idea to a data-flow where the second transformation updates code fields in a datamart, isn't much of a stretch:

SRCTABLE,SRCFIELD,TARGETTABLE,TARGETFIELD
person_rec,country,dim_country,country_code
person_rec,sex,dim_sex,sex_code
application_rec,major1,dim_major,major_code
application_rec,conc1,dim_concentration,concentration_code
status_rec,cur_stat,dim_current_status,cur_stat_code

We'd need to pull unique ${TARGETTABLE}.${TARGETFIELD} values as well, use a Merge rows (diff) step, use a Filter rows step to find only the 'new' ones, and then a Execute SQL script step to update the targets.

Exciting!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文