从主数据库更新从属 SQL Server 2008 数据库同时最大限度地减少中断的模式

发布于 2024-07-25 02:09:11 字数 530 浏览 6 评论 0 原文

我们有一个 ASP.NET Web 应用程序,由使用 SQL Server 2008 的许多实例的 Web 场托管,在该应用程序中,我们将来自多个源的数据聚合和预处理为针对快速最终用户查询性能而优化的格式(产生 5-1000 万条数据)某些表中的行)。 聚合和优化是由后端服务器上的服务完成的,然后我们希望将其分发到 Web 应用程序实例使用的多个只读前端副本,以实现最大的可扩展性。

我的问题是如何将这些数据从后端数据库获取到只读前端副本,并且在此过程中不会影响其性能。 前端 Web 应用程序实例将承受持续的高负载,并且需要始终具有良好的响应能力。

后端数据库不断更新,因此我怀疑事务复制不是最好的方法,因为对副本的持续更新流会损害其性能。

数据陈旧并不是一个大问题,因此快照复制可能是一种可行的方法,但这会导致复制期间的性能不佳。

执行删除和批量插入将导致一段时间内没有用户查询的数据。

我真的不想编写一个复杂的集群方法,在更新过程中将副本从集群中删除 - 是否有一些我们可以不需要太多努力就能完成的事情,或者是否有更好的替代方案?

We have an ASP.NET web application hosted by a web farm of many instances using SQL Server 2008 in which we do aggregation and pre-processing of data from multiple sources into a format optimised for fast end user query performance (producing 5-10 million rows in some tables). The aggregation and optimisation is done by a service on a back end server which we then want to distribute to multiple read only front end copies used by the web application instances to facilitate maximum scalability.

My question is about the best way to get this data from a back end database out to the read only front end copies in such a way that does not kill their performance during the process. The front end web application instances will be under constant high load and need to have good responsiveness at all times.

The backend database is constantly being updated so I suspect that transactional replication will not be the best approach, as the constant stream of updates to the copies will hurt their performance.

Staleness of data is not a huge issue so snapshot replication might be the way to go, but this will result in poor performance during the periods of replication.

Doing a drop and bulk insert will result in periods with no data for user queries.

I don't really want to get into writing a complex cluster approach where we drop copies out of the cluster during updating - is there something along these lines that we can do without too much effort, or is there a better alternative?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦明 2024-08-01 02:09:12

选项 1:编写一个应用程序以使用行级事务传输数据。 这可能需要更长的时间,但不会导致站点使用数据的中断,因为这些行在读取发生之前和之后都存在,只是包含新数据。 此处理将在单独的服务器上进行,以最大限度地减少负载。

在 sql server 2008 中,您可以将 READ_COMMITTED_SNAPSHOT 设置为 ON,以确保正在更新的行不会导致阻塞。

但基本上这个应用程序所做的就是读取新数据,因为它可以从一个数据库中取出并存入另一个数据库中。

选项2:将数据(表或整个数据库)从聚合服务器移动到前端服务器。 如果可能的话,将其自动化。 然后切换您的 Web 应用程序以指向新的数据库或表以供将来请求。 这可行,但需要对网络应用程序进行控制,而您可能没有控制权。

选项 3:如果您正在谈论单个表(或者这可以适用于多个表),您可以做的是视图交换。 因此,您针对指向表 A 的 sql 视图编写代码。您在表 B 上工作,当它准备就绪时,您更新视图以指向表 B。您甚至可以编写一个函数来确定活动表并自动执行整个交换的事情。

选项 4:您也许可以使用服务器的字节级复制之类的方法。 但这听起来很可怕。 这基本上是将服务器从 A 点复制到 B 点,精确到字节。 它主要用于灾难恢复情况,这听起来像是某种灾难恢复情况,但事实并非如此。

选项 5:放弃并学习如何销售保险。 :)

Option 1: Write an app to transfer the data using row level transactions. It might take longer but would result in no interruption of the site using the data because the rows are there before and after the read occurs, just with new data. This processing would happen on a separate server to minimize load.

In sql server 2008 you can set READ_COMMITTED_SNAPSHOT to ON to ensure that the row being updated is not causing blocking.

But basically all this app does is read the new data as it is available out from one database and into the other.

Option 2: Move the data (tables or entire database) from the aggregation server to the front-end server. Automate this if possible. Then switch your web application to point to the new database or tables for future requests. This works but requires control over the web app, which you may not have.

Option 3: If you were talking about a single table (or this could work with many) what you can do is a view swap. So you write your code against a sql view which points to table A. You do you work on Table B and when it's ready, you update the view to point to Table B. You can even write a function that determines the active table and automate the whole swap thing.

Option 4: You might be able to use something like byte-level replication of the server. That sounds scary though. Which is basically copying the server from point A to point B exactly down to the very bytes. It's mostly used in DR situations which this sounds like it could be a kinda/sorta DR situation, but not really.

Option 5: Give up and learn how to sell insurance. :)

夏了南城 2024-08-01 02:09:11

实际上,SQL Server 2005(和2008)中内置了一项技术,旨在解决此类问题。 服务代理(我将进一步称为单边带)。 问题是它的学习曲线非常陡峭。

我知道 MySpace 公开了如何使用 SSB 来管理他们的 SQL Server 集群:MySpace 使用 SQL Server Service Broker 保护 1 PB 数据的完整性。 我知道还有几个(主要)网站使用类似的模式,但不幸的是它们尚未公开,所以我无法引用名称。 我个人参与了一些围绕该技术的项目(我是 SQL Server 团队的前成员)。

现在请记住,SSB 并不是像复制那样的专用数据传输技术。 因此,您不会找到任何类似于复制的发布向导和简单部署选项的内容(检查表并进行传输)。 SSB 是一种可靠的消息传递技术,因此其原语停留在消息交换级别,您必须编写利用 数据更改捕获,将其打包为消息,并将消息解包到目的地的关系表中。

为什么仍然有一些公司在执行您所描述的任务时更喜欢 SSB 而不是复制,因为 SSB 在可靠性和可扩展性方面有更好的故事。 我知道有一些项目在 1500 多个站点之间交换数据,远远超出了复制的能力。 SSB 也是从物理拓扑中抽象出来的:您可以移动数据库、重命名机器、重建服务器,而无需更改应用程序。 由于数据流发生在逻辑路由上,因此应用程序可以即时适应新的拓扑。 SSB 还能够适应长时间的断开连接和停机,能够在断开连接数小时、数天甚至数月后恢复数据流。 通过引擎集成实现的高吞吐量(SSB 是 SQL 引擎本身的一部分,不是像复制这样的卫星应用程序和进程的集合)意味着可以在合理的时间内处理积压的更改(我知道有的站点正在经历一半的时间)每分钟百万个交易)。 SSB 应用程序通常依赖内部激活来处理传入数据。 SSB 还具有一些独特的功能,例如内置 负载平衡(通过路由),具有粘性会话语义,支持 无死锁应用程序特定相关处理优先数据传送,对数据库镜像的特定支持,基于证书的身份验证,用于跨域操作,内置 持久计时器等等。

这不是一个具体的答案“如何将数据从服务器 A 上的表 T 移动到服务器 B”。 更多的是关于如何“在服务器 A 和服务器 B 之间交换数据”的通用技术。

There is actually a technology built into SQL Server 2005 (and 2008) that is designed to address this kind of issues. Service Broker (I'll refer further as SSB). The problem is that it has a very steep learning curve.

I know MySpace went public how uses SSB to manage their park of SQL Servers: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. I know of several more (major) sites that use similar patterns but unfortunately they have not gone public so I cannot refer names. I was personally involved with some projects around this technology (I am a former member of the SQL Server team).

Now bear in mind that SSB is not a dedicate data transfer technology like Replication. As such you will not find anyhting similar to the publishing wizards and simple deployment options of Replication (check a table and it gets transferred). SSB is a reliable messaging technology and as such its primitives stop at the level of message exchange, you would have to write the code that leverages the data change capture, packs it as messages and also the unpacking of message into relational tables at destination.

Why still some companies preffer SSB over Replication at a task like you describe is because SSB has a far better story when it comes to reliability and scalability. I know of projects that exchange data between 1500+ sites, far beyond the capabilities of Replication. SSB is also abstracted from the physical topology: you can move databases, rename machines, rebuild servers all without changing the application. Because data flow occurs over logical routes the application can addapt on-the-fly to new topologies. SSB is also resilient to long periods of disocnnect and downtime, being capable of resuming the data flow after hours, days and even months of disconnect. High troughput achieved by engine integration (SSB is part of the SQL engine itself, is not a collection of sattelite applications and processes like Replication) means that the backlog of changes can be processes on reasonable times (I know of sites that are going through half a million transactions per minute). SSB applications typically rely on internal Activation to process the incomming data. SSB also has some unique features like built-in load balancing (via routes) with sticky session semantics, support for deadlock free application specific correlated processing, priority data delivery, specific support for database mirroring, certificate based authentication for cross domain operations, built-in persisted timers and many more.

This is not a specific answer 'how to move data from table T on server A to server B'. Is more a generic technology on how to 'exhange data between server A and server B'.

时光暖心i 2024-08-01 02:09:11

我以前从未处理过这种情况,但确实为此提出了一个可能的解决方案。 基本上,它需要更改您的主数据库结构。 您无需存储数据,而是保留该数据的修改记录。 因此,如果添加记录,则存储“表 X,插入具有以下值的新记录:...”进行修改,只需存储表、字段和更改的值。 删除时,只需存储删除的记录即可。 每次修改都会以时间戳存储。

您的客户端系统将保留数据库的本地副本,并在特定日期/时间后定期要求所有数据库修改。 然后,您在本地数据库上执行这些修改,它将再次保持最新状态。

那么后端呢? 好吧,它只会保留一个修改列表,也许还有一个包含基础数据的表格。 仅保留修改还意味着您可以跟踪历史记录,从而允许您询问系统一年前的情况。

其执行效果取决于后端数据库的修改数量。 但如果您每 15 分钟请求一次更改,那么每次都不应该有那么多数据。

但是,我从来没有机会在实际应用中解决这个问题,所以这对我来说仍然是一个理论原理。看起来很快,但需要做很多工作。

I've never had to deal with this scenario before but did come up with a possible solution for this. Basically, it would require a change in your main database structure. Instead of storing the data, you would keep records of modifications of this data. Thus, if a record is added, you store "Table X, inserted new record with these values: ..." With modifications, just store the table, field and changed value. With deletions, just store which record is deleted. Every modification will be stored with a timestamp.

Your client systems would keep their local copies of the database and will regularly ask for all database modifications after a certain date/time. You then execute those modifications on the local database and it will be up-to-date again.

And the back-end? Well, it would just keep a list of modifications and perhaps a table with the base data. Keeping just the modifications also means you're keeping track of history, allowing you to ask the system what it looked like a year ago.

How well this would perform depends on the number of modifications on the back-end database. But if you request the changes every 15 minutes, it shouldn't be that much data every time.

But again, I never had the chance to work this out in a real application so it's still a theoretic principle for me. It seems fast but a lot of work will be required.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文