如何最大限度地减少 SQL 复制中的数据
我想将数据从海上的船只复制到陆上站点。 连接有时是通过卫星链路进行的,速度可能很慢并且延迟很高。
我们的应用程序中的延迟很重要,岸上的人员应该尽快获得数据。
有一个正在复制的表,由 id、日期时间和一些长度可能不同的二进制数据组成,通常<1。 50 字节。
离岸应用程序不断地将数据(硬件测量)推送到表中,我们希望这些数据尽快传输到岸上。
MS SQL Server 2008 中是否有任何技巧可以帮助减少带宽使用并减少延迟? 初始测试使用 100 kB/s 的带宽。
我们的替代方案是滚动我们自己的数据传输,这里的初始原型使用 10 kB/s 的带宽(同时在相同的时间跨度内传输相同的数据)。 这没有任何可靠性和完整性检查,因此这个数字被人为地压低了。
I want to replicate data from a boat offshore to an onshore site. The connection is sometimes via a satellite link and can be slow and have a high latency.
Latency in our application is important, the people on-shore should have the data as soon as possible.
There is one table being replicated, consisting of an id, datetime and some binary data that may vary in length, usually < 50 bytes.
An application off-shore pushes data (hardware measurements) into the table constantly and we want these data on-shore as fast as possible.
Are there any tricks in MS SQL Server 2008 that can help to decrease the bandwith usage and decrease the latency? Initial testing uses a bandwidth of 100 kB/s.
Our alternative is to roll our own data transfer and initial prototyping here uses a bandwidth of 10 kB/s (while transferring the same data in the same timespan). This is without any reliability and integrity checks so this number is artificially low.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以尝试不同的复制配置文件或创建自己的复制配置文件。 不同的配置文件针对不同的网络/带宽场景进行了优化。
MSDN 在此处讨论了复制配置文件。
You can try out different replication profiles or create your own. Different profiles are optimized for different network/bandwidth scenarios.
MSDN talks about replication profiles here.
您是否考虑过购买 WAN 加速器设备? 我是新来的,无法发布链接,但有几个可用。
本质上,发送端的设备压缩传出的数据,接收端解压缩它,所有这些都是动态且完全不可见的。 这样做的好处是可以提高流量的表观速度,并且不需要您更改服务器配置。 它应该是完全透明的。
Have you considered getting a WAN accelerator appliance? I'm too new here to post a link, but there are several available.
Essentially, the appliance on the sending end compresses the outgoing data, and the receiving end decompresses it, all on the fly and completely invisibly. This has the benefit of increasing the apparent speed of the traffic and not requiring you to change your server configurations. It should be entirely transparent.
我建议在 SQL Server 之外进行动态压缩/解压缩。 也就是说,SQL 会正常复制数据,但网络堆栈中的某些内容会进行压缩,因此数据会更小且带宽效率更高。
我什么都不知道,但我确信这些存在。
不要直接修改 SQL 文件。 如果不是不可能的话,那也是疯狂的。
I'd suggest on the fly compression/decompression outside of SQL Server. That is, SQL replicates the data normally but something in the network stack compresses so it's much smaller and bandwidth efficient.
I don't know of anything but I'm sure these exist.
Don't mess around with the SQL files directly. That's madness if not impossible.
您是否希望它始终只复制一张表? 有很多更新,还是只是插入? 复制是通过在目标上为每个更改的行调用插入/更新存储过程来实现的。 一种廉价的优化是强制存储过程名称较小。 默认情况下,它由表名称组成,但 IIRC 您可以强制为项目使用不同的存储过程名称。 假设一行插入了大约 58 个字节,那么在存储过程名称中保存 5 或 10 个字符就非常重要。
我猜想,如果您更新二进制字段,它通常是整个替换? 如果这是不正确的,并且您可能会更改一小部分,那么您可以推出自己的差异修补机制。 也许第二个表包含对原始数据的字节更改的时间序列。 听起来很痛苦,但根据您的工作负载,可以节省大量带宽变化。
插入通常是按逻辑批次完成的吗? 如果是这样,您可以将一批插入存储为复制表中的一个自定义 blob,并使用辅助进程将它们解压到您想要使用的最终表中。 这将减少这些小行流经复制的开销。
Do you expect it to always be only one table that is replicated? Are there many updates, or just inserts? The replication is implemented by calling an insert/update sproc on the destination for each changed row. One cheap optimization is to force the sproc name to be small. By default it is composed from the table name, but IIRC you can force a different sproc name for the article. Given an insert of around 58 bytes for a row, saving 5 or 10 characters in the sproc name is significant.
I would guess that if you update the binary field it is typically a whole replacement? If that is incorrect and you might change a small portion, you could roll your own diff patching mechanism. Perhaps a second table that contains a time series of byte changes to the originals. Sounds like a pain, but could have huge savings of bandwidth changes depending on your workload.
Are the inserts generally done in logical batches? If so, you could store a batch of inserts as one customized blob in a replicated table, and have a secondary process that unpacks them into the final table you want to work with. This would reduce the overhead of these small rows flowing through replication.