将数据从公共生产服务器归档到内部归档服务器

发布于 2024-12-08 14:03:29 字数 778 浏览 0 评论 0原文

我们有一个网络应用程序,其中保存了来自嵌入式设备的日志数据的详细存档。我们希望保留这些详细数据以供内部分析,但公共可访问的生产服务器上三个月以上不需要这些数据。 Web 应用程序仅插入新数据,不会更新或删除详细数据。

因此,我们希望构建一个归档系统,定期(例如每天)从生产服务器(包括最近三个月)获取新数据的副本,然后从生产服务器中删除所有超过三个月的数据。

不幸的是,有时我们的一些嵌入式设备会离线一段时间(几天到几周)。在这种情况下,可能会发生在已复制/归档时间内将新数据添加到系统中的情况。

在公共生产服务器上,我们使用 SQL Server 2008 标准版,在内部存档系统上,我们希望使用 SQL Server 2008 R2 企业版,以便从存档的分区和压缩中受益。

目前我正在考虑以下方法:

  1. 每天将数据从生产系统复制到存档系统。当生产中的旧数据被删除时,不应将其复制到存档系统。我找到了一个选项来忽略复制目标上的删除操作。

  2. 如果不需要在存档系统上保留当前数据,我会定期将所有早于特定时间的数据移至存档系统。但我必须同步当前数据并移动所有旧数据。

    也许我必须使用混合解决方案:复制以同步存档和生产系统之间的当前数据,并将旧数据移动到存档系统上的存档表(同时将存档数据的删除从当前表复制到生产系统)。

你怎么认为?您对此类问题有一些建议或了解一些最佳实践吗?这个主题是否在其他地方(例如书籍)有所涉及?

预先非常感谢您。

PS:我不确定这个问题应该在这里问还是在服务器故障上问。如果我在这里发布的决定是错误的,请移动它。谢谢。

We have a web-application where - amongst other things - an detailed archive of log-data from embedded devices is kept. We want to keep this detailed data for internal analysis, but this data is not needed on the public accessible production server for longer than three months. The web-application only inserts new data, there are no updates or deletes on the detailed data.

So we want to build an archive system which gets regularly (e.g. daily) a copy of the new data from the production server (including the last three months) and then all data older then three months is removed from the production server.

Unfortunately sometimes some of our embedded devices are offline for some time (days to weeks). In this case it could happen that new data is added to the system for an already copied/archived time.

On the public production server we are using SQL Server 2008 Standard Edition, on the internal archive system we want to use SQL Server 2008 R2 Enterprise Edition to benefit from partitioning and compression for the archive.

At the moment I am considering the following approaches:

  1. Daily replication of data from production to archive system. When old data on production is deleted this should not be replicated to the archive system. I found an option to ignore delete operations on the replication target.

  2. If there was not the requirement to also have the current data on the archive system I regularly would move all data older than a specific time to the archive system. But I have to synchronize the current data and move all old data.

    Maybe I have to use a mixed solution: Replication for synchronizing the current data between archive and production system and moving of old data to an archive table on the archive system (while replicating the deletes for the archived data from the current table to the production system).

What do you think? Do you have some recommendations or know some best practices regarding such problems? Is this topic covered elsewhere (e.g. books)?

Thank you very much in advance.

PS: I am not sure if this question should be asked here or on serverfault. Please move it if my decision posting it here was wrong. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

锦上情书 2024-12-15 14:03:30

您可以做的一件事是让您的网络应用程序同时插入两个数据库中。这样归档就不依赖于产品数据库。根本不。

您还可以考虑对产品进行分区。 DB,以便从产品数据库中删除数据更容易。 (但由于您的产品数据库是标准版,所以这不是一个选项。)在这种情况下,您可能必须从产品数据库中删除行。要执行删除,请勿在一笔事务中执行整个删除。您可能想将其分成批次。

最后,在进行删除时密切观察您的日志增长。它可以很快变大。

One thing you could do is have your web app insert in both DBs simultaneously. This way archive is not dependent on prod db. at all.

You may also consider partitioning the prod. DB so that removing data from prod DB is easier. (But since your prod DB is standard edition, that's not an option.) In which case, you may have to delete rows from prod DB. To do the deletes, dont do the entire delete in one transaction. You may want to break it down into batches.

Finally, watch your tlog growth closely while doing the deletes. It can grow big pretty fast.

北渚 2024-12-15 14:03:30

我们谈论的大约有多少表需要存档?

如果只有一张或几张表,您可以通过 SQL Server Agent 定期运行一些 SQL。

像这样的事情(非常简单):

-- copy to archive database
insert into ArchiveServer.dbo.ArchiveTable (Column1, Column2, ...)
select Column1, Column2, ...
from ProductionTableOnThisServer
where DateColumn < dateadd(m, -3, getdate())

-- delete in production database
delete from ProductionTableOnThisServer
where DateColumn < dateadd(m, -3, getdate())

为了使该示例正常工作,它需要在生产服务器上运行并且存档服务器必须配置为链接服务器位于生产服务器中。

当然,这个小例子还远远不够完美(甚至还没有准备好投入生产!)。
这只是为了表达基本想法。

在现实世界中,您可能还希望:

  • 将两个查询包装在一个事务中,这样如果归档由于某种原因不起作用,则真实数据不会被删除
  • 仅插入真正新的行并更新那些已经存在的行改变了
  • 等等...

About how many tables are we talking that need to be archived?

If it's only one or a few tables, you could just run some SQL regularly via SQL Server Agent.

Something like this (very simplified):

-- copy to archive database
insert into ArchiveServer.dbo.ArchiveTable (Column1, Column2, ...)
select Column1, Column2, ...
from ProductionTableOnThisServer
where DateColumn < dateadd(m, -3, getdate())

-- delete in production database
delete from ProductionTableOnThisServer
where DateColumn < dateadd(m, -3, getdate())

In order for this example to work, it needs to be run on the production server and the archive server must be configured as a linked server in the production server.

Of course this little example is far from perfect (or even production ready!).
This was just to express the basic idea.

In the real world, you probably also want to:

  • wrap both queries together in a transaction so the real data doesn't get deleted if the archiving didn't work for some reason
  • insert only the rows which are really new and update those which have changed
  • and so on...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文