如何使用 Pentaho Data Integration 在表之间复制列
我认为这将是一项简单的任务,但由于我是 PDI 新手,我无法 找出到目前为止选择哪种转换来完成以下任务:
我正在使用 Pentaho Data Integration(以前的 Kettle)社区版,将一个数据库“A”的一个表(“tasksA”)中的值映射/复制到另一个表 另一个数据库 B 中的“tasksB”。taskA 有一个列“description”,我想要 将这些值复制到“tasksB”中的“taskName”列。 此外,我必须多次复制“描述”的每个值,因为 在“tasksB”中,“taskName”中的每个值都有多行。
也许这可以通过直接 SQL 实现,但我想尝试是否 我可以使用 PDI 来定义它,使其更具可读性,特别是因为在下一步中我必须将其扩展到涉及的其他表。
所以我必须说出哪个值 “description”必须映射到“taskName”的值以及 “taskName”列中包含此值的每个元组(嗯,听起来像 WHERE 子句...)都应该被替换。
我对“表输入”和“表输出”步骤的第一次实验 当我简单地在它们之间画一个跃点并修改“数据库”时,它不起作用 “表输出”步骤的“字段”选项卡,生成“删除列”语句 在生成的 SQL 中这不是我想要的。我不想修改架构,只需复制值。
如果有人能指出我所需的正确步骤/转换,那就太好了, 我研究了 Pentaho Wiki 中的第一个示例,并获得了 Casters 等人的《Pentaho Kettle Solutions》一书。但可以找出如何 来解决这个问题。非常感谢您的帮助。
I thought this would be an easy task, but since I am new to PDI, I could not
find out so far which transform to choose to accomplish the following:
I am using Pentaho Data Integration (former Kettle), Community Edition, to map/copy values from one table ('tasksA') of one database 'A' to another table
'tasksB' in another database B. tasksA has a column 'description' and I want
to copy these values to the column 'taskName' in 'tasksB'.
Furthermore, I have to copy each value of 'description' several times, since
in 'tasksB', there are multiple lines for each value in 'taskName'.
Maybe this would be possible by direct SQL, but I wanted to try whether
I can define this more readable with PDI, especially because in the next step I will have to extend it to other tables involved.
So I have to tell which value of
'description' has to be mapped onto which value of 'taskName' and that in
every tuple containing this value (well, sounds like a WHERE clause...) in the column 'taskName' it should be replaced.
My first experiments with the 'Table input' and 'Table output' steps
did not work when I simply drew a hop between them and modifying the 'database
fields' tab of the 'Table output' step, which generated 'drop column' statements
in the resulting SQL which is not what I want. I don't want to modify the schema, just copy the values.
Would be great if someone could point me to the right steps/transforms needed,
I worked through the first examples from the Pentaho Wiki and have got the 'Pentaho Kettle Solutions' book of Casters et al. but could find out how
to do solve this. Many thanks in advance for any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我做对了,您应该使用连接到“插入/更新”步骤的表输入。
在插入/更新步骤中,您需要告知任务 A 中的键应在任务 B 上查找。然后定义taskB上的哪些字段应该更新:描述(作为流字段)->任务名称(作为表字段)。
请记住,如果未找到此键,则会在taskB 上插入一行。如果这不是您的计划,您将需要构建类似的内容:Table Input ->数据库查找->筛选行 ->插入/更新
If I got this right, you should use the Table Input connected to a "Insert/Update" step.
On the Insert/Update step you need to inform the keys from tasksA where should be looked up on tasksB. Then define which fields on tasksB should be updated: description (as stream field) -> taskName (as the table field).
Keep in mind that if this key is not found, a row will be inserted on tasksB. If it is not what you plan, you'll need to build something like: Table Input -> Database Lookup -> Filter Rows -> Insert/Update
@RFVoltolini 有一个很好的答案。或者,您可以转到
表输入 ->更新
并将错误输出连接到其他内容,例如文本文件输出。
@RFVoltolini has a good answer. Alternatively you could go
Table Input -> Update
And connect the error output to something else like a Text file output.