数据仓库 Postgres
我们正在考虑使用 SSIS 来维护 PostgreSql 数据仓库。我以前在 SQL Server 之间使用过它,没有任何问题,但要让它与 Postgres 很好地配合却遇到了很多困难。我正在使用 OLEDB PGNP 数据提供程序的评估版本 (http://www.postgresql. org/about/news.1004)。
我想从一些简单的事情开始,比如事实表上的 UPSERT(每天更新/插入 10k-15k 行),但这被证明非常困难(更不用说我将来想使用代理键)。
我已经尝试过(链接)并且(http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_ -has-it-changed.aspx),它们实际上是相同的(除了当我尝试更新插入时,我最后并没有真正理解联合)但是当我遇到参数时,我遇到了同样的问题使用 OLEDb 命令进行更新 - 我尝试使用 (http:// /technet.microsoft.com/en-us/library/ms141773.aspx)但这似乎不起作用,我收到验证错误 -
complent.... 的外部列不同步对于数据源列...需要从外部列中删除外部列“Param_2”。 (前两个参数也会重复出现此错误 - 使用 sql 连接时从未遇到过此错误,因为它支持命名参数)
有人遇到过此情况吗?
AND:
事实上,这个简单的任务在 SSIS 中显然很难完成,这表明我使用了错误的工具来完成这项工作 - 有没有更好(并且仍然灵活)的方法来做到这一点?或者另一个 ETL 包是否更适合在两个 Postgres 数据库之间使用? -其他选项包括 (http://en.wikipedia. org/wiki/Extract,_transform,_load#Open-source_ETL_frameworks)。我可以直接编写大量 SQL 来为我完成此任务,但我想要一个简洁且易于维护的解决方案。
We're considering using SSIS to maintain a PostgreSql data warehouse. I've used it before between SQL Servers with no problems, but am having a lot of difficulty getting it to play nicely with Postgres. I’m using the evaluation version of the OLEDB PGNP data provider (http://www.postgresql.org/about/news.1004).
I wanted to start with something simple like UPSERT on the fact table (10k-15k rows are updated/inserted daily), but this is proving very difficult (not to mention I’ll want to use surrogate keys in the future).
I’ve attempted (Link) and (http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx) which are effectively the same (except I don’t really understand the union all at the end when I’m trying to upsert) But I run into the same problem with parameters when doing the update using a OLEDb command – which I tried to overcome using (http://technet.microsoft.com/en-us/library/ms141773.aspx) but that just doesn’t seem to work, I get a validation error –
The external columns for complent.... are out of sync with the datasource columns... external column “Param_2” needs to be removed from the external columns.
(this error is repeated for the first two parameters as well – never came across this using the sql connection as it supports named parameters)
Has anyone come across this?
AND:
The fact that this simple task is apparently so difficult to do in SSIS suggests I’m using the wrong tool for the job - is there a better (and still flexible) way of doing this? Or would another ETL package be better for use between two Postgres database? -Other options include any listed on (http://en.wikipedia.org/wiki/Extract,_transform,_load#Open-source_ETL_frameworks). I could just go and write a load of SQL to do this for me, but I wanted a neat and easily maintainable solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我已使用缓慢变化维度向导来实现此目的,并取得了良好的成功。它可能会为您提供所需的内容,特别是使用向导
http://msdn .microsoft.com/en-us/library/ms141715.aspx
外部列不同步:SSIS 区分大小写 - 我多次遇到此问题,它让我抓狂。
不管怎样,这个简单的任务都需要一些工作。 SSIS 还不是企业级 ETL 产品,但它确实为您提供了一些快速且简单的功能,并且足以满足大多数 ETL 工作。我想这也与您对此的舒适程度有关。
I have used the Slowly Changing Dimension wizard for this with good success. It may give you what you are looking for especially with the Wizard
http://msdn.microsoft.com/en-us/library/ms141715.aspx
The External Columns Out Of Sync: SSIS is Case Sensitive - I encountered this issue multiple times and it makes me want to pull my hair out.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
SCD 对于我想要的来说太慢了。我需要使用基于集合的sql。
事实证明,我的很多问题都与提供程序中的错误有关。
我打开了一个论坛主题 (http://www.pgoledb。 com/forum/viewtopic.php?f=4&t=49)并与版主/支持/开发人员进行了有益的讨论。
另外,Postgres 不允许您进行跨数据库查询,因此我通过以下方式解决了问题: 将
请注意,临时表是不是临时表,而是临时存储数据的存档表模式的副本。
花了一段时间,但我最终到达了那里。
您建议使用哪种企业 ETL 解决方案?
SCD is way too slow for what I want. I need to use set based sql.
It turned out that a lot of my problems were with bugs in the provider.
I opened a forum topic (http://www.pgoledb.com/forum/viewtopic.php?f=4&t=49) and had a useful discussion with the moderator/support/developer person.
Also Postgres doesn't let you do cross db querys, so I solved the problem this way:
Note that the temp table is not atchally a temp table, but a copy of the archive table schema to temporarily stored data in.
Took a while, but I got there in the end.
What enterprise ETL solution would you suggest?