如何使表与 SQL Server - ETL 中的查询保持同步?

发布于 2024-08-06 00:14:13 字数 601 浏览 4 评论 0原文

我不确定如何表达这个问题,所以我会尝试解释。我在SQL Server 2005上有一个第三方数据库。我有另一个SQL Server 2008,我也想“发布”第三方数据库中的一些数据。然后,我将使用该数据库作为门户和报告服务的后端 - 它应该是数据仓库。

在目标服务器上,我想将数据存储在与第三方数据库不同的表结构中。我想要对某些表进行非规范化,并且有很多列是不必要的。我还需要向一些表添加其他字段,我需要根据存储在同一行中的数据进行更新。例如,有一些 varchar 字段包含我想要填充其他列的信息。所有这些都应该清理数据并使其更容易报告。

我可以编写查询来获取特定目标表中我想要的所有信息。但是,我希望能够使其与其他服务器上的源保持最新。它不必立即更新(尽管这很好),但我希望它可能每 10 分钟更新一次。有数百或数千行数据,但数据的更改和新行的添加等并不大。

我环顾四周,但仍然不确定实现这一目标的最佳方法。据我所知,复制不能满足我的需要。我可以手动编写 t-sql 来执行更新,也许使用 Merge 语句,然后将其安排为 sql server 代理的作业。我也一直在研究 SSIS,它看起来适合 ETL 之类的东西。

我只是不确定用什么来实现这一目标,我希望得到一些关于应该如何去做这种事情的建议?任何建议将不胜感激。

I wan't sure how to word this question so I'll try and explain. I have a third-party database on SQL Server 2005. I have another SQL Server 2008, which I want to "publish" some of the data in the third-party database too. This database I shall then use as the back-end for a portal and reporting services - it shall be the data warehouse.

On the destination server I want store the data in different table structures to that in the third-party db. Some tables I want to denormalize and there are lots of columns that aren't necessary. I'll also need to add additional fields to some of the tables which I'll need to update based on data stored in the same rows. For example, there are varchar fields that contain info I'll want to populate other columns with. All of this should cleanse the data and make it easier to report on.

I can write the query(s) to get all the info I want in a particular destination table. However, I want to be able to keep it up-to-date with the source on the other server. It doesn't have to be updated immediately (although that would be good) but I'd like for it be updated perhaps every 10 minutes. There are 100's of thousands of rows of data but the changes to the data and addition of new rows etc. isn't huge.

I've had a look around but I'm still not sure the best way to achieve this. As far as I can tell replication won't do what I need. I could manually write the t-sql to do the updates perhaps using the Merge statement and then schedule it as a job with sql server agent. I've also been having a look at SSIS and that looks to be geared at the ETL kind of thing.

I'm just not sure what to use to achieve this and I was hoping to get some advice on how one should go about doing this kind-of thing? Any suggestions would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

幽梦紫曦~ 2024-08-13 00:14:13

对于那些模式/关系没有改变的表,我仍然强烈推荐复制。

对于数据和/或关系发生显着变化的表,我建议您开发一个 Service Broker 实现来处理该问题。服务代理 (SB) 的高级方法是:

Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)

我不会为此推荐 SSIS,除非您想要进行诸如拨号导出/导入之类的操作。对于这类事情来说这很好,但恕我直言,对于连续或短期增量数据分发来说,这太笨拙和麻烦了。

For that tables whose schemas/realtions are not changing, I would still strongly recommend Replication.

For the tables whose data and/or relations are changing significantly, then I would recommend that you develop a Service Broker implementation to handle that. The hi-level approach with service broker (SB) is:

Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)

I would not recommend SSIS for this, unless you wanted to go to something like dialy exports/imports. It's fine for that kind of thing, but IMHO far too kludgey and cumbersome for either continuous or short-period incremental data distribution.

清风疏影 2024-08-13 00:14:13

Nick,我自己也走过 SSIS 路线。我有一些基于 SSIS 的作业每 15 分钟运行一次,并且执行您想要执行的操作。我们有一个巨大的关系数据库,然后我们想使用名为 Tableau 的产品在其基础上进行复杂的报告。我们很快发现我们的关系模型并不是那么受欢迎,所以我用 SSAS 在它上面构建了一个多维数据集,并且该多维数据集每 15 分钟更新和处理一次。
是的,SSIS 确实给人一种主要用于直接 ETL 作业的光环,但我发现它也可以用于像这样的简单快速作业。

Nick, I have gone the SSIS route myself. I have jobs that run every 15 minutes that are based in SSIS and do the exact thing you are trying to do. We have a huge relational database and then we wanted to do complicated reporting on top of it using a product called Tableau. We quickly discovered that our relational model wasn't really so hot for that so I built a cube over it with SSAS and that cube is updated and processed every 15 minutes.
Yes SSIS does give the aura of being mainly for straight ETL jobs but I have found that it can be used for simple quick jobs like this as well.

清醇 2024-08-13 00:14:13

我认为,分期和分区对于您的情况来说太过分了。我现在正在 SSIS 中实施同样的事情,但频率为 1 小时,因为我需要花一些时间进行支持活动。我确信使用 SSIS 是一个很好的方法。

在设计过程中,我想到了另一种实现自定义复制的方法,即通过自定义变更数据捕获(CDC)流程。通过这种方式,您可以获得接近实时的复制,但这是一件棘手的事情。

I think, staging and partitioning will be too much for your case. I am implementing the same thing in SSIS now but with a frequency of 1 hour as I need to give some time for support activities. I am sure that using SSIS is a good way of doing it.

During the design, I had thought of another way to achieve custom replication, by customizing the Change Data Capture (CDC) process. This way you can get near real time replication, but is a tricky thing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文