如何从两个实时数据库创建和同步合并的仅报告数据库?
我需要快速实现一个只读数据库,其中包含从两个结构相同的实时数据库中提取的数据。
实时数据库实际上是来自 Dynamics 会计系统的公司数据库,因此我很高兴获得任何 Dynamics 特定建议,但这主要是一个 SQL 问题。 它是 Great Plains 被 Microsoft 收购之前的一个相当旧的 Dynamics 版本。 这是在 SQL Server 2000 上。
我们有访问 Dynamics 数据的报告和应用程序。 这些应用程序旨在查看一家公司的数据库。 现在我们需要添加另一个。 大多数这些报告和应用程序看到组合数据是适当的。 他们并不真正关心订单或发票存在于哪家公司。他们只查看少数表格。
在我看来,最简单的解决方案是创建一个包含组合数据的仅报告数据库。 最好,我们需要一种有效的方法来每天多次更新此数据库的更改。
我是一名开发人员,不是数据库专家,但这是我的计划:
使用所需的表创建组合报告数据库,最初具有与实时数据库相同的表结构。
所有 Dynamics 表似乎都有一个名为 DEX_ROW_ID 的 int 标识列。 我不确定它的用途是什么(它没有索引),但这似乎是唯一标识行的明显通用方法。 在报告数据库上,我将其更改为普通 int (不是身份)。 我将在所有数据库中的 DEX_ROW_ID 上创建唯一索引。
Dynamics 没有时间戳,因此我将向实时数据库中的表添加时间戳列,并在报告数据库中添加相应的二进制 (8) 列。 我假设并希望 Dynamics 不会因额外的索引和列而感到不安。
将 int CompanyId 列添加到报告数据库表中,并将其添加到任何唯一索引的末尾。 即使没有这一点,大多数数据自然也是唯一的。 即,两个实时数据库的订单号和发票号等将不同。 我们可能需要对应用程序进行一些小的更改,但除了将它们指向新的报告数据库之外,我不期望做太多事情。
假设我的报告数据库称为 Reports,实时数据库是 Live1 和 Live2,时间戳列称为 TS 并且所有数据库都位于同一服务器上...这是我第一次尝试更新脚本,用于复制一个名为 MyTable 的表中的更改在 Live1 中到报告数据库。
USE Reports
CREATE TABLE #Changes
(
ReportId int,
LiveId int
)
/* Collect in a temp table the ids or rows which have been deleted or changed
in the live db L.DEX_ROW_ID will be null if the row has been deleted */
INSERT INTO #Changes
SELECT R.DEX_ROW_ID, L.DEX_ROW_ID
FROM MyTable R LEFT OUTER JOIN Live1.dbo.MyTable L ON L.DEX_ROW_ID = R.DEX_ROW_ID
WHERE R.CompanyId = 1 AND L.DEX_ROW_ID IS NULL OR L.TS <> R.TS
/* Delete rows that have been deleted or changed on the live db
I wonder if using join syntax would run better than the subquery. */
DELETE FROM MyTable
WHERE CompanyId = 1 AND DEX_ROW_ID IN (SELECT ReportId FROM #Changes)
/* Recopy rows that have changed in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable L
WHERE L.DEX_ROW_ID IN (SELECT ReportId FROM #Changes WHERE LiveId IS NOT NULL)
/* Copy the rows that are new in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable
WHERE DEX_ROW_ID > (SELECT MAX(DEX_ROW_ID) FROM MyTable WHERE CompanyId = 1)
然后对 Live2 数据库执行相同的操作。 对报告中的每个表重复此操作。 我知道我应该使用参数 @CompanyId 而不是文字,但我不能对实时数据库名称执行此操作,有些我可能会使用 C# 程序或其他程序动态生成这些名称。
我正在寻找对我在这里所做的事情的任何建议、建议或批评。 我知道这不会是原子的。 当此脚本运行时,实时数据库上可能会发生一些事情。 我想我们可以接受这一点。 当实时数据库没有发生任何事情时,我们可能会每晚或每周制作一份完整副本。
我们需要注重性能而不是优雅或完美。 一些初始测试的第一个查询对最大的表运行大约 30 秒,因此我乐观地认为这会起作用,但我也想知道我是否遗漏了一些明显的东西或没有看到以树换林。
我们真的不想处理报告数据库上的日志文件。 我们可以将其设置为简单的恢复模型并忘记日志吗?
谢谢
I need to quickly implement a read-only database containing data pulled from two identically structured live databases.
The live dbs are actually company dbs from a Dynamics accounting system so I'm happy for any Dynamics specific advice but this is mostly a SQL question. It's a fairly old version of Dynamics from before Great Plains was acquired by Microsoft. This is on SQL Server 2000.
We have reports and applications which access the Dynamics data. These apps are designed to look at one company db. Now we need to add another. It's appropriate that most of these reports and apps see combined data. They don't really care which company an order or invoice exists in. They only look at a small number of the tables.
It seems to me that the simplest solution is to create a reports only db with combined data. Preferably, we need an efficient way to update this db with changes several times a day.
I'm a developer, not a db expert but here's my plan:
Create the combined reporting db with the required tables initially with the same table structure as the live dbs.
All Dynamics tables seem to have an int identity column called DEX_ROW_ID. I'm not sure what it's used for, (it's not indexed) but that seems like the obvious generic way to uniquely identify rows. On the reporting db I will change it to a normal int (not an identity). I will create a unique index on DEX_ROW_ID in all dbs.
Dynamics does not have timestamps so I will add a timestamp column to tables in the live dbs and a corresponding binary(8) column in the reporting db. I'm assuming and hoping that Dynamics won't be upset by the additional index and column.
Add an int CompanyId column to the reporting db tables and add it to the end of any unique indexes. Most data will be naturally unique even without that. ie, order and invoice numbers etc will be different for the two live dbs. We may need to make some minor changes to the applications but I'm not expecting to do much other than point them to the new reporting db.
Assuming my reporting db is called Reports, the live dbs are Live1 and Live2, the timestamp column is called TS and all dbs are on the same server ... here's my first attempt at an update script for copying the changes in one table called MyTable in Live1 to the reporting db.
USE Reports
CREATE TABLE #Changes
(
ReportId int,
LiveId int
)
/* Collect in a temp table the ids or rows which have been deleted or changed
in the live db L.DEX_ROW_ID will be null if the row has been deleted */
INSERT INTO #Changes
SELECT R.DEX_ROW_ID, L.DEX_ROW_ID
FROM MyTable R LEFT OUTER JOIN Live1.dbo.MyTable L ON L.DEX_ROW_ID = R.DEX_ROW_ID
WHERE R.CompanyId = 1 AND L.DEX_ROW_ID IS NULL OR L.TS <> R.TS
/* Delete rows that have been deleted or changed on the live db
I wonder if using join syntax would run better than the subquery. */
DELETE FROM MyTable
WHERE CompanyId = 1 AND DEX_ROW_ID IN (SELECT ReportId FROM #Changes)
/* Recopy rows that have changed in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable L
WHERE L.DEX_ROW_ID IN (SELECT ReportId FROM #Changes WHERE LiveId IS NOT NULL)
/* Copy the rows that are new in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable
WHERE DEX_ROW_ID > (SELECT MAX(DEX_ROW_ID) FROM MyTable WHERE CompanyId = 1)
Then do the same for the Live2 db. Repeat for every table in Reports. I know I should use a parameter @CompanyId instead of the literal but I can't do that for the live db name some I might generate these dynamically with a C# program or something.
I'm looking for any advice, suggestions or critique on what I'm doing here. I know it won't be atomic. Things could be happening on the live db while this script runs. I think we can live with that. We'll probably do a full copy either nightly or weekly when nothing is happening on the live dbs.
We need to favor performance over elegance or perfection. Some initial testing has the first query with the TS comparisons running at about 30 seconds for the biggest table so I'm optimistic that this is going to work but I'd also like to know if I'm missing something obvious or not seeing the forest for the trees.
We don't really want to deal with log files on the reporting db. Can we just set that to simple recovery model and forget about logs?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为这里有几个悬而未决的问题。
您需要这些报告接近实时吗? 或者这种报告可以每天更新吗? 但假设您需要最新的数据。
您是否考虑过直接查询数据库并即时合并每个报告的数据? 您必须做大量的报告来重复设计、创建和支持实时合并复制数据库的工作。
对于针对生产数据库的任何单个查询来说,三十秒(恕我直言)是不可接受的。 花费这么长时间可能有多种与调优相关的原因,但这至少意味着您将需要认真的专业 SQL Server 优化资源(即人员)。 如果这对报告查询来说是一个问题,那么对于查询来说,维护一个单独的报告数据库并不是一个好兆头。
请记住,如果您需要整合到单个数据库,那么值得考虑是否应该将其设为 OLAP 数据库而不是镜像。 镜像会更快、更容易,但从长远来看,OLAP 会更加灵活和强大; 最好从头开始。
I think there are a couple open questions here.
Do you need these reports to be near-real-time? Or is this this sort of reporting that could live with daily updates? But assume you need up-to-the-minute data.
Have you considered querying the databases directly and merging the data per-report on the fly? You'll have to do a lot of reporting to duplicate the effort that's going to go into designing, creating, and supporting a real-time merged replicated database.
Thirty seconds is (IMHO) unacceptable for any single query against a production database. There could be any number of tuning-related reasons for taking this long, but it at least means you're going to need serious professional SQL Server optimization resources (i.e. people). And if this is a problem for the queries for reports, it doesn't bode well for the queries to maintain a separate database for reporting.
Tuck into the back of your mind the consideration that, if you need to consolidate to a single database, it's worth considering whether you should make it an OLAP database rather than a mirror. The mirror will be quicker and easier, but the OLAP would be far more flexible and powerful in the long term; and it might be well to go the whole way from the beginning.
我想做的最后一件事就是编写自定义更新脚本。 首先尝试这些防弹方法:
这可能看起来像是蛮力。 但由于您复制的是 2000 时代的数据库,因此对于当今的硬件来说,暴力破解不可能成为问题。 作为一个额外的优势,这些方法可以由系统管理员而不是开发人员支持。
方法 1 还具有作为备份验证的额外优点。 :)
The last thing I'd want to do is write a custom update script. Try these bulletproof methods first:
This might seem like brute force. But since you're copying a 2000-era database, brute force cannot be a problem with today's hardware. As an added advantage, these methods can be supported by a sysadmin instead of a developer.
Method 1 has the added added advantage of serving as backup verification. :)