Pentaho PDI:CSV每行的执行转换?
这是我们试图做的蒸馏版。转换步骤是一个“表输入”:
SELECT DISTINCT ${SRCFIELD} FROM ${SRCTABLE}
我们想运行来自CSV中每一行的变量/参数的SQL:
SRCFIELD,SRCTABLE
carols_key,carols_table
mikes_ix,mikes_rec
their_field,their_table
在这种情况下,我们希望它运行三次转换,其中一个用于每个数据行CSV,从这些表中的这些字段中汲取唯一值。我希望有一种简单的方法可以做到这一点。
我认为唯一的困难是,我们没有偶然发现了正确的步骤/条目和正确的设置。
在“父”转换中四处张开,我们的最高希望是:
- 我们尝试将
csv文件输入
链接到设置变量
(希望将其馈送到转换epotutor
一次一行),但是当我们从CSV中有多个行时,这会抓住。 - 我们尝试了管道
CSV文件输入
直接到转换epecutor
,但这仅将TE的“静态输入值”发送到子转换。
我们还使用作业探索了转换
对象,我们非常希望偶然发现“ 执行每个输入行”的应用程序,但尚未弄清楚如何一次将数据输送到一排。
建议?
Here's a distilled version of what we're trying to do. The transformation step is a "Table Input":
SELECT DISTINCT ${SRCFIELD} FROM ${SRCTABLE}
We want to run that SQL with variables/parameters set from each line in our CSV:
SRCFIELD,SRCTABLE
carols_key,carols_table
mikes_ix,mikes_rec
their_field,their_table
In this case we'd want it to run the transformation three times, one for each data line in the CSV, to pull unique values from those fields in those tables. I'm hoping there's a simple way to do this.
I think the only difficulty is, we haven't stumbled across the right step/entry and the right settings.
Poking around in a "parent" transformation, the highest hopes we had were:
- We tried chaining
CSV file input
toSet Variables
(hoping to feed it toTransformation Executor
one line at a time) but that gripes when we have more than one line from the CSV. - We tried piping
CSV file input
directly toTransformation Executor
but that only sends TE's "static input value" to the sub-transformation.
We also explored using a job, with a Transformation
object, we were very hopeful to stumble into what the "Execute every input row" applied to, but haven't figured out how to pipe data to it one row at a time.
Suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
啊!
为此,我们必须使用两个转换来创建一个工作。第一个从CSV读取“参数”,第二行对于第一个CSV数据执行了一次职责。
在作业中,第一个转换是这样设置的:
options/logging/cognuments/parameters选项卡都将其留作default
(右键单击,打开引用的对象 - >转换):
step2:将行复制到结果< ==这是工作中的 ,第二个转换的设置如下:
选项:“执行每个输入行”
记录/参数选项卡作为默认值
参数:
(右键单击,打开引用的对象 - >转换):
从$ {srctable}
”选择不同的$ {srcfield}代码因此第一个转换将来自CSV的“配置”数据收集,并在一次时间录制中将这些值传递给第二个转换(因为检查了“执行每个输入行”)。
因此,现在有了这样的CSV:
我们可以为所有这些特定字段提取不同的值,以及更多。而且很容易维护哪些表以及检查哪些字段。
将此想法扩展到数据流中,其中第二个转换更新数据amart中的代码字段并不是一件很大的范围:
我们需要拉动唯一的$ {targetTable}。$ {target field}值,请使用a <<代码>合并行(diff)步骤,使用
filter行
步骤仅查找'新'',然后a执行SQL Script
步骤以更新更新目标。令人兴奋!
Aha!
To do this, we must create a JOB with TWO TRANSFORMATIONS. The first reads "parameters" from the CSV and the second does its duty once for each row of CSV data from the first.
In the JOB, the first transformation is set up like this:
Options/Logging/Arguments/Parameters tabs are all left as default
In the transformation itself (right click, open referenced object->transformation):
Back in the JOB, the second transformation is set up like so:
Options: "Execute every input row" is checked
Logging/Arguments tabs are left as default
Parameters:
In the transformation itself (right click, open referenced object->transformation):
SELECT DISTINCT ${SRCFIELD} code FROM ${SRCTABLE}
"So the first transformation gathers the "config" data from the CSV and, one-record-at-a-time, passes those values to the second transformation (since "Execute every input row" is checked).
So now with a CSV like this:
We can pull distinct values for all those specific fields, and lots more. And it's easy to maintain which tables and which fields are examined.
Expanding this idea to a data-flow where the second transformation updates code fields in a datamart, isn't much of a stretch:
We'd need to pull unique ${TARGETTABLE}.${TARGETFIELD} values as well, use a
Merge rows (diff)
step, use aFilter rows
step to find only the 'new' ones, and then aExecute SQL script
step to update the targets.Exciting!