ETL性能问题

发布于 2024-09-13 14:31:46 字数 328 浏览 4 评论 0原文

我在生产环境中运行 ETL 流程时遇到一个重要问题。当我的 ETL 运行时,OLAP 服务器转动得非常慢,我认为这是因为 ETL 正在更新事实表中的几个现有行并添加新行。我试图避免这个问题,在 DB1 中进行整个数据库复制和 ETL 写入,并从 DB2 读取 OLAP 服务器(复制的)。它根本不起作用。

您能否给我一些建议,为我指出正确的解决方案来避免这个问题。

我正在使用 SQL Server 2005。8GB RAM Mondrian OLAP 服务器转变为 Jboss 服务器。 8GB 内存。 ETL 每 3 小时运行一次,运行时间为 2 小时。

我将不胜感激任何帮助。

I have an important problem running ETL Process in production environment. While my ETL is running, the OLAP Server turns extremely slowly, I think this is because the ETL is updating several existing rows in the fact table and adding new ones. I tried to avoid this problem having a whole data base replication and ETL writes in DB1 and OLAP Server read from DB2(the replicated ones). It doesn't work at all.

Can you give me some advices which point me in the right solutions to avoid this problem.

I'm using SQL Server 2005. 8GB RAM
Mondrian OLAP Server into a Jboss Server. 8GB RAM.
ETL is running every 3 hours and is taking 2 hours running.

I'd appreciate any help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

又怨 2024-09-20 14:31:46

让 ETL 将其流程运行到新表中,然后使用 ALTER TABLE ... SWITCH PARTITION 将新数据切换到 Facts 表中。请参阅 利用分区切换高效传输数据。我还会重新审视 ETL 本身,因为它可以 导入约 2TB/小时 通过良好调整的 ETL 流程,您可能可以自己压缩更多的性能,我怀疑您每 3 小时导入 4TB...

至于使用复制来缓解负载问题的想法,我只能说:复制总是会增加负载,因为复制过程本身相当昂贵。发布者上的每次插入、更新、删除也是订阅者上的插入、更新或删除,并且存在检测发布者上的更改、分发开销以及在订阅者上应用更改的额外开销……这些都会加起来,不减去任何东西。

Have the ETL run it's process into a new table, then use ALTER TABLE ... SWITCH PARTITION to switch in the new data into the Facts table. See Transferring Data Efficiently by Using Partition Switching. I would also revisit the ETL itself, given that the one can import about 2TB/hour with a well tuned ETL process, you can probably squeeze a bit more of performance yourself, I doubt you import 4TB every 3 hours...

As for the idea of using Replication to alleviate load problems, all I can say is this: Replication will always add to the load, as the replication process itself is quite expensive. Every insert, update, delete on the publisher is also an insert, update or delete on the subscriber, and there is the additional overhead of detecting changes on publisher, distribution overhead and applying changes on subscriber... is just going to add up, not subtract anything.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文