当数据对于客户端来说太大时,在不可靠的客户端和服务器之间同步数据结构
摘要:
当客户端无法将所有数据保存在内存中并不断断开连接时,如何与大量数据同步?
说明:
我有一个实时(ajax/comet)应用程序,它将在网络上显示一些数据。 我喜欢将其视为网络上的视图和服务器上的模型。
假设我在服务器上有大量记录,所有这些记录都一直在添加/删除/修改。问题如下:
- 这是网络,客户端可能有很多连接/断开连接。当客户端断开连接时,数据可能已被修改,重新连接时需要更新客户端。但是,由于数据太大,每次重新连接时客户端无法发送所有数据。
- 由于数据量如此之多,客户端显然无法发送全部数据。想象一下包含数千条消息的 Gmail 帐户或包含......整个世界的 google 地图!
我意识到最初会将一些相关数据子集的完整快照发送到客户端,然后仅进行增量更新。这可能会通过某种序列号来完成...客户端会说“我收到的最后一次更新是#234”,并且客户端将发送#234和#current之间的所有消息。
我还意识到客户端视图将通知服务器它正在“显示”100-200 条记录,“所以只向我发送这些”(可能是 0-300,无论策略如何)。
然而,我讨厌自己编写所有这些代码的想法。有一个足够普遍和常见的问题,必须已经有库(或至少是逐步的食谱)。
我希望在 Java 或 Node.js 中执行此操作。如果解决方案有其他语言版本,我愿意更换。
Summary:
How do I synchronize very large amount of data with a client which can't hold all the data in memory and keeps disconnecting?
Explanation:
I have a real-time (ajax/comet) app which will display some data on the web.
I like to think of this as the view being on the web and the model being on the server.
Say I have a large number of records on the server, all of them being added/removed/modified all the time. Here are the problems:
-This being the web, the client is likely to have many connections/disconnections. While the client is disconnected, data may have been modified and the client will need to be updated when reconnected. However, the client can't be sent ALL the data every time there is a re-connections, since the data is so large.
-Since there is so much data, the client obviously can't be sent all of it. Think of a gmail account with thousands of messages or google map with ... the whole world!
I realize that initially a complete snapshot of some relevant subset of data will be sent to the client, and thereafter only incremental updates. This will likely be done through some sort of sequence numbers...the client will say "the last update I received was #234" and the client will send all messages between #234 and #current.
I also realize that the client-view will notify the server that it is 'displaying' records 100-200 "so only send me those" (perhaps 0-300, whatever the strategy).
However, I hate the idea of coding all of this myself. There is a general enough and common enough problem that there must be libraries (or at least step-by-step recipes) already.
I am looking to do this either in Java or node.js. If solutions are available in other languages, I'll be willing to switch.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试发布/订阅解决方案。在给定的开始时间让客户端订阅服务器事件。
服务器根据发生的时间记录所有数据更改事件。
在给定时间或重新连接客户端后,客户端会要求提供自上次同步以来所有更改的数据行的列表。
您可以将所有逻辑保留在服务器上并仅同步更改。将导致服务器上出现典型的“select * from table where id in (select id fromchanged_rows wherechange_date >给定日期)”语句,该语句可以优化。
Try a pub/sub solution. Subscribe the client at a given start time to your server events.
The server logs all data change events based on the time they occur.
After a given tim eor reconnect of your client the client asks for a list of all changed data rows since last sync.
You can keep all the logic on the server and just sync the changes. Would result in a typical "select * from table where id in (select id from changed_rows where change_date > given_date)" statement on the server, which can be optimized.