从sql-server到另一个数据存储库的实时单向同步

发布于 2024-11-16 08:38:43 字数 859 浏览 4 评论 0 原文

在此门户上的上一个问题中,我询问了有关同步的一些见解SQL Server 和基于键值的数据存储库之间的数据。

为了解决同样的问题(从 SQL 到 HBase 或任何其他数据库的单向实时同步),我需要考虑一些性能和延迟问题,但没有找到一种非常万无一失的方法。

  1. 我们有多个 SQL 2008 数据分片,其中数据从不同的来源更新,并由许多进程同时处理(并且 UI 从相同的分片读取)。

  2. 目标是在任何时间点获取所选表中的所有更新,并以几乎实时的方式将它们传输到其他数据源。

  3. SQL 分片的变化量将保持在 100-500 MB 范围内(如果我们保持 1 分钟的频率)。我们不想对 SQL 服务器进行重大更改,因为在迁移整个系统后我们会丢弃它。

  4. 不幸的是,我们的数据访问层在各个层中都是混乱的。否则,这将是在两个方向上分叉更新的最佳方式。

  5. 触发器会减慢分片速度,并使它们处于无响应状态。

  6. 不确定 SQL Server 2008 是否具有与 SQL Server 2005 类似的功能通知服务以及效果如何。

任何其他创新解决方案都会非常有帮助。

这里我的问题不是将数据从关系形式转换为Key-value形式(这相当容易),而是如何实时获取SQL Server更新(可以承受1-2的延迟)分钟)而不影响用户体验。

In my previous question on this portal, I had asked about some insight about syncing data between SQL Server and key-value based data repositories.

In lieu of the same problem (one way real-time synchronization from SQL to HBase or any other database), I need to take care of some performance and latency considerations and did not find a very foolproof way of doing it.

  1. We have multiple SQL 2008 data shards where data is updated from various sources and processed by many processes at the same time (and UI reads from the same shards).

  2. The goal is to get all updates in selected tables at any point of time and transport them to other data source in almost real-time fashion.

  3. Amount of change in SQL shards will stays in 100-500 MB range (if we keep the frequency of 1 min). We do not want to make major changes in SQL servers as we would discard it after migrating complete system.

  4. Unfortunately, our data access layer is messed up in various layers. Otherwise, it would have been the best way to fork the updates in two directions.

  5. Triggers will slow down the shards and leave them in an unresponsive state.

  6. Not sure if SQL Server 2008 has something similar to SQL Server 2005 Notification services and how effective that would be.

Any other innovative solution would greatly helpful.

Here my problem is not about transforming the data from relational to Key-value form (it's fairly easy), but how to get the SQL Server updates in real-time (can afford the latency of 1-2 minutes) without affecting the user experience.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

空宴 2024-11-23 08:38:43

从下到上有数据层:存储、文件系统、数据库和应用程序。

最有效的方法是使用存储复制。它对性能几乎没有影响,可以配置为同步或异步,并且不是免费的。您可以通过 google 搜索 SRDF 或 MirrorView 了解它的概念。

然后您可以查看文件系统复制。它类似于存储复制,但发生在操作系统/文件系统层,消耗主机系统的资源(CPU、IO、内存)。您可以通过 google symantec storage Foundation 获取更多信息。

在数据库级别,您可以进行数据库复制/日志传送来复制数据。 SQL Server有这样的设施。

“最便宜”的解决方案是修改您的应用程序,例如您的 4),但我建议您使用消息队列来复制数据,以尽量减少对性能的影响。

There are layers of data from bottom to top: storage, file system, db and app.

The most efficient way of doing this is using storage replication. It almost has no impact on performance, can be configured as sync or async, and is not free. You may google SRDF or MirrorView for a concept of it.

Then you can have a look at file system replication. It's similar to storage replication but happens in OS/file system layer, consuming resources(CPU, IO, mem) of the host system. You may google symantec storage foundation for more information.

At DB level, you can do database replication / log shipping to replicate the data. SQL server has such facilities.

The "cheapest" solution would be modify your app, like your 4), but I suggest you use a message queue to replicate the data to minimize the impact on performance.

雪若未夕 2024-11-23 08:38:43

您可能想要研究的一个选项是 SQL Server Integrated更改跟踪(SQL2008 或更高版本的一部分)。这是查找 SQL Server 数据库中发生的更改(包括删除)的极其有效的方法,对 SQLDB 的影响非常小,不需要触发器,并提供了一种允许您将数据更改移动到Hadoop。

完全公开,我在 Cotega 工作,这个数据同步是我们非常关注的事情。如果这是您有兴趣采取的方向,我很乐意为您提供更多帮助。

One option you might want to look into is SQL Server Integrated Change Tracking (part of SQL2008 or higher). This is an incredibly efficient way of finding the changes that have occurred in your SQL Server database (including deletes), has very little impact on your SQLDB, does not require triggers and provides a good way of allowing you to then move the data changes to Hadoop.

Full disclosure, I work on Cotega and this data sync is something we are making a big focus on. I am happy to help more if this is a direction you are interested in taking.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文