如何在 Pentaho Spoon 中拆分列的值?
我想创建一个 Spoon
转换,它将适用于列的多个值。 我的转换的输入是一个 CSV 文件。在该 CSV 文件中,有一个名为“Technology”的列,其中包含 0 个或多个由分号分隔的值,如下所示。
+------------------------------------------------------+
row_id | Technology
+------------------------------------------------------+
1 | Cobol ; Db2 ; Jcl ; Vsam ; Cics ; Changeman ;
2 | Oracle ; Sql ; Db2 ; Oracle 9i ;
3 | Windows 2000 ; SQL ;
+------------------------------------------------------+
我在数据库中有一个名为“Technologies”的表,其架构如下:
+----------------------+
Technologies
+----------------------+
id | technology_name
+----------------------+
其中 id
列设置为自动增量。
仅当 Technologies
表中不存在该值时,我才想插入 technology
列的值。
谁能告诉我
1) 使用哪种类型的步骤来拆分 technology
列的值? 2)如何只插入一次值?例如,在 row 1
和 row 2
中,Db2
重复,但我只想插入 Db2
一次。
提前致谢 !
I want to create a Spoon
transformation which will work on multiple values of an column.
Input to my transformation is an CSV file. In that CSV file there's one column named 'Technology' which contains 0 or more values seperated by semi colon as follows.
+------------------------------------------------------+
row_id | Technology
+------------------------------------------------------+
1 | Cobol ; Db2 ; Jcl ; Vsam ; Cics ; Changeman ;
2 | Oracle ; Sql ; Db2 ; Oracle 9i ;
3 | Windows 2000 ; SQL ;
+------------------------------------------------------+
I have one table in database named 'Technologies' and its schema is as follows :
+----------------------+
Technologies
+----------------------+
id | technology_name
+----------------------+
where id
column is set to auto increment.
I want to insert values of technology
column only if that value is not present in Technologies
table.
Can anyone please tell me
1) Which type of step to be used to split values of technology
column?
2) How to insert value only once? For example in row 1
and row 2
, Db2
is repeated but I want to insert Db2
only once.
Thanks in advance !
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用“拆分字段”(在“转换”下)拆分内容。
CSV文件输入-->分割字段 -->其余转换
将“要拆分的字段”设置为“技术”,并将“分隔符”设置为分号。
关于非重复字段,我的建议是您将名称本身作为表的键。将其转换为小写并用数据库安全的等效字符替换任何空格/特殊字符,然后将其设为主键。您最终应该得到一个仅包含唯一实例的表。
哈
Use the "Split Fields" (Under "Transform") to split the contents.
CSV file input --> Split Fields --> rest of transformation
Set the "Field to Split" to "Technology" and set the "Delimiter" to a semi-colon.
Regarding the non-repeating field, my suggestion would be you make the name itself the key to the table. Shift it to lower-case and replace any spaces / special characters with database safe equiv's and then make that the primary key. You should end up with a table full of only unique instances.
hth