I/O在Oracle中的PowerCenter Informatica问题
我在Oracle中有两个表,我必须在表之间同步值(字段
列)。我正在使用Informatica PowerCenter进行此同步操作。源预选赛查询会导致高I/O使用,我需要解决它。
Table1
Table1有大约20m的数据。 表1中的字段
是实际字段。 TIMESTAMP
字段保存创建&更新日期,并具有每日分区。
ID | 字段 | 时间戳 |
---|---|---|
1 | A | 2017-05-12 03:13:40 |
2 | B | 2002-11-01 07:30:46 |
3 | C | 2008-03-03-03 03:26:29 |
Table2
Table2 Table2的数据约为500m。 表2中的字段
应尽可能同步在表1中。 TIMESTAMP
字段保存创建&更新日期,并具有每日分区。 Table2也是映射中的目标。
ID | Table1_id | 字段 | 时间戳 | Action |
---|---|---|---|---|
100 | 1 | A | 2005-09-30 03:20:41 | nothing |
101 | 1 | b | 2015-06-29 09: 41:44 | 更新字段为 |
102 | 1 | c | 2016-01-10 23:35:49 | 更新字段作为 |
103 | 2 | a | 2019-05-08 07:42:46 | 更新字段为b |
104 | 2 | b | 2003-06-02 11:23:57 | nothing> nothing |
105 | 2 | <强> C | 2021-09-21 12:04:24 | 更新字段为B |
106 | 3 | A | 2022-01-23 01:17:17 | 更新字段为c |
107 | 3 | b | 2008-04-24 15:17:25 | 更新字段作为C |
108 | 3 | C | 2010-01-15 07: 20:13 |
什么
没有
SELECT *
FROM Table1 t1, Table2 t2
WHERE t1.Id = t2.Table1_Id AND t1.Field <> t2.Field
UPDATE Table2
SET
Field = :tu.Field,
Timestamp = SYSDATE
WHERE Id = :tu.Id
I have two tables in Oracle and I have to synchronize values (Field
column) between the tables. I'm using Informatica PowerCenter for this synchronization operation. The source qualifier query causes high I/O usage and I need to solve it.
Table1
Table1 has about 20M data. Field
in Table1 is the actual field. Timestamp
field holds create & update date and it has daily partition.
Id | Field | Timestamp |
---|---|---|
1 | A | 2017-05-12 03:13:40 |
2 | B | 2002-11-01 07:30:46 |
3 | C | 2008-03-03 03:26:29 |
Table2
Table2 has about 500M data. Field
in Table2 should be as sync as possible to Field in Table1. Timestamp
field holds create & update date and it has daily partition. Table2 is also target in the mapping.
Id | Table1_Id | Field | Timestamp | Action |
---|---|---|---|---|
100 | 1 | A | 2005-09-30 03:20:41 | Nothing |
101 | 1 | B | 2015-06-29 09:41:44 | Update Field as A |
102 | 1 | C | 2016-01-10 23:35:49 | Update Field as A |
103 | 2 | A | 2019-05-08 07:42:46 | Update Field as B |
104 | 2 | B | 2003-06-02 11:23:57 | Nothing |
105 | 2 | C | 2021-09-21 12:04:24 | Update Field as B |
106 | 3 | A | 2022-01-23 01:17:18 | Update Field as C |
107 | 3 | B | 2008-04-24 15:17:25 | Update Field as C |
108 | 3 | C | 2010-01-15 07:20:13 | Nothing |
Mapping Queries
Source Qualifier Query
SELECT *
FROM Table1 t1, Table2 t2
WHERE t1.Id = t2.Table1_Id AND t1.Field <> t2.Field
Update Transformation Query
UPDATE Table2
SET
Field = :tu.Field,
Timestamp = SYSDATE
WHERE Id = :tu.Id
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用以下方法。
SQ-您的SQL是正确的,如果您看到它的工作正常,可以使用它,但添加A&lt;&gt;分区日期键列中的子句。您也可以使用此SQL加快速度。
然后在您的INDA目标T2定义中,请确保您提到partition_date作为密钥的一部分以及ID。
然后将更新策略设置为
dd_update
。您也可以将会话设置为更新。并删除该目标覆盖。实际上,这将在整个桌子上应用更新查询,有时可能是ABD I/O效率低下。
Informatica具有强大的功能,可以通过更新策略在BUNCH中更新数据。您可以根据自己的性能增加提交间隔。
您不应该尝试使用SQL在单个GO中更新一个500m的表。是的,您可以使用PLSQL在一堆中进行更新。
You can use below approach.
SQ - Your SQL is correct and you can use it if you see its working but add a <> clause on partition date key column. You can use this SQL to speed it up as well.
Then in your infa target T2 definition, make sure you mention partition_date as part of key along with ID.
Then use a update strategy set to
DD_UPDATE
. You can set the session to update as well.And remove that target override. This actually applies the update query on the whole table and sometime can be inefficient abd I/O intensive.
Informatica is powerful to update data in bunch through update strategy. You can increase commit interval as per your performance.
You shouldn't try to update a 500M table in a single go using SQL. Yes, you can use PLSQL to update in a bunch.