在两个不同的系统之间同步对象,最好的方法是什么?
我正在致力于使用基于 XML 的有效负载在 iPhone 和网站之间同步两个业务对象,并且很乐意征求一些关于最佳例程的想法。
这个问题的本质是相当通用的,我可以看到它适用于需要在 Web 实体和客户端(桌面、移动电话等)之间同步业务对象的各种不同系统。
可以编辑业务对象,删除并更新双方。 双方都可以在本地存储对象,但仅在 iPhone 端启动同步以进行断开连接查看。 所有对象都有一个updated_at和created_at时间戳,并由双方的RDBMS支持(iPhone端的SQLite和网络上的MySQL......再说一次,我认为这并不重要),并且手机确实记录了最后一次a已尝试同步。 否则,不会存储其他数据(目前)。
您将使用什么算法来最大程度地减少同步系统之间的网络通信? 如果“软删除”不是一个选项,您将如何处理删除? 您将添加哪些数据模型更改来促进这一点?
I am working on syncing two business objects between an iPhone and a Web site using an XML-based payload and would love to solicit some ideas for an optimal routine.
The nature of this question is fairly generic though and I can see it being applicable to a variety of different systems that need to sync business objects between a web entity and a client (desktop, mobile phone, etc.)
The business objects can be edited, deleted, and updated on both sides. Both sides can store the object locally but the sync is only initiated on the iPhone side for disconnected viewing. All objects have an updated_at and created_at timestamp and are backed by an RDBMS on both sides (SQLite on the iPhone side and MySQL on the web... again I don't think this matters much) and the phone does record the last time a sync was attempted. Otherwise, no other data is stored (at the moment).
What algorithm would you use to minimize network chatter between the systems for syncing? How would you handle deletes if "soft-deletes" are not an option? What data model changes would you add to facilite this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
最简单的方法:同步时,传输所有记录
其中updated_at >= @last_sync_at
。 缺点:这种方法根本不能很好地容忍时钟偏差。保留每次更新行时都会递增的版本号列(以便时钟偏差不会干扰同步过程)和上次同步的版本号(以便可以识别潜在冲突的更改)可能更安全。 为了提高带宽效率,请在发送到每个复制对等方的最后版本的每个数据库中保留一个缓存,以便仅需要传输修改的行。 如果这将是星形拓扑,则叶子可以使用简化的模式,其中最后同步的版本存储在每个表中。
为了支持删除同步,需要某种形式的软删除,但这可以采用“逻辑删除”记录的形式,其中仅包含已删除行的键。 只有在确定所有副本都已处理墓碑后,才能安全删除墓碑,否则落后的副本可能会复活您认为已删除的记录。
The simplest approach: when syncing, transfer all records
where updated_at >= @last_sync_at
. Down side: this approach doesn't tolerate clock skew very well at all.It is probably safer to keep a version number column that is incremented each time a row is updated (so that clock skew doesn't foul your sync process) and a last-synced version number (so that potentially conflicting changes can be identified). To make this bandwidth-efficient, keep a cache in each database of the last version sent to each replication peer so that only modified rows need to be transmitted. If this is going to be a star topology, the leaves can use a simplified schema where the last synced version is stored in each table.
Some form of soft-deletes are required in order to support sync of deletes, however this can be in the form of a "tombstone" record which contains only the key of the deleted row. Tombstones can only be safely deleted once you are sure that all replicas have processed them, otherwise it is possible for a straggling replica to resurrect a record you thought was deleted.
因此,我认为总的来说,您的问题与断开同步有关。
因此,我认为应该发生以下情况:
初始同步 您检索数据以及与其关联的任何信息(行版本、文件校验和等)。 请务必存储此信息并使其保持原始状态,直到下次成功同步。 应对此数据的副本进行更改。
跟踪更改如果您正在处理数据库行,那么您基本上必须跟踪插入、更新和删除操作。 如果您正在处理像 xml 这样的文本文件,那么它会稍微复杂一些。 如果多个用户可能会同时编辑此文件,那么您必须有一个 diff 工具,以便可以更精细地检测冲突(而不是整个文件)。
检查冲突 同样,如果您只处理数据库行,则很容易检测到冲突。 你可以有另一列,每当行更新时它就会增加(我认为 mssql 有这个内置的,不确定 mysql)。 因此,如果您拥有的副本的编号与服务器上的编号不同,则会发生冲突。 对于文件或字符串,校验和即可完成这项工作。 我想您也可以使用修改日期,但请确保您有非常精确和准确的测量,以防止遗漏。 例如:假设我检索了一个文件,您在检索后立即保存它。 假设时间差为 1 毫秒。 然后我对文件进行更改,然后尝试保存它。 如果记录的最后修改时间仅精确到 10 毫秒,那么我检索到的文件很可能与您保存的文件具有相同的修改日期,因此程序认为不存在冲突并覆盖您的更改。 所以为了安全起见,我一般不会使用这种方法。 另一方面,微小修改后发生校验和/哈希冲突的可能性几乎为零。
解决冲突 现在这是棘手的部分。 如果这是一个自动化过程,那么您必须评估情况并决定是否要覆盖更改、丢失更改或再次从服务器检索数据并尝试重做更改。 对你来说幸运的是,似乎会有人际互动。 但编码仍然很痛苦。 如果您正在处理数据库行,则可以检查每个单独的列并将其与服务器中的数据进行比较,然后将其呈现给用户。 这个想法是以非常精细的方式向用户呈现冲突,以免让他们不知所措。 大多数冲突在许多不同的地方都有非常小的差异,因此一次向用户呈现一个小的差异。 所以对于文本文件来说,它几乎是一样的,但要复杂一百倍。 所以基本上你必须创建或使用一个 diff 工具(文本比较是一个完全不同的主题,在这里太宽泛了),它可以让你知道文件中的微小变化以及它们在哪里,就像在文件中一样。数据库:插入、删除或编辑文本的地方。 然后以相同的方式将其呈现给用户。 因此,基本上对于每个小冲突,用户必须选择是否放弃更改、覆盖服务器中的更改或在发送到服务器之前执行手动编辑。
因此,如果您做得正确,则应该向用户提供一个冲突列表(如果有)。 这些冲突应该足够细化,以便用户能够快速做出决定。 例如,冲突是拼写更改,用户可以更轻松地从单词拼写中进行选择,而不是向用户提供整个段落并告诉他发生了更改并且他们必须决定要做什么,用户就必须寻找这个小的拼写错误。
其他注意事项: 数据验证 - 请记住,您必须在解决冲突后执行验证,因为数据可能已更改文本比较 - 正如我所说,这是一个大主题。 所以谷歌一下! 断开同步 - 我认为有一些文章。
来源:https://softwareengineering.stackexchange.com/questions/94634/synchronization-网络服务方法或论文
So I think in summary your questions relate to disconnected synchronization.
So here is what I think should happen:
Initial Sync You retrieve the data and any information associated with it (row versions, file checksums etc). it is important you store this information and leave it pristine until the next succesful sync. Changes should be made on a COPY of this data.
Tracking Changes If you are dealing with database rows, the idea is, you basically have to track insert, update and delete operations. If you are dealing with text files like xml, then its slightly more complicated. If it likely that multiple users will edit this file at the same time, then you would have to have a diff tool, so conflicts can be detected in a more granular level (instead of the whole file).
Checking for conflicts Again if you are just dealing with database rows, conflicts are easy to detect. You can have another column that increments whenever the row is updated (i think mssql has this builtin not sure about mysql). So if the copy you have has a different number than what's on the server, then you have a conflict. For files or strings, a checksum will do the job. I suppose you could also use modified date but make sure that you have a very precise and accurate measurement to prevent misses. for example: lets say I retrieve a file and you save it as soon as I retrieved it. Lets say the time difference is a 1 millisecond. I then make changes to file then I try to save it. If the recorded last modified time is accurate only to 10 milliseconds, there is a good chance that the file I retrieved will have the same modified date as the one you saved so the program thinks theres no conflict and overwrites your changes. So I generally don't use this method just to be on the safe side. On the other hand the chances of a checksum/hash collision after a minor modification is close to none.
Resolving conflicts Now this is the tricky part. If this is an automated process, then you would have to assess the situation and decide whether you want to overwrite the changes, lose your changes or retrieve the data from the server again and attempt to redo the changes. Luckily for you, it seems that there will be human interaction. But its still a lot of pain to code. If you are dealing with database rows, you can check each individual column and compare it against the data in the server and present it to the user. The idea is to present conflicts to the user in a very granular way so as to not overwhelm them. Most conflicts have very small differences in many different places so present it to the user one small difference at a time. So for text files, its almost the same but more a hundred times more complicated. So basically you would have to create or use a diff tool (Text comparison is a whole different subject and is too broad to mention here) that lets you know of the small changes in the file and where they are in a similar fashion as in a database: where text was inserted, deleted or edited. Then present that to the user in the same way. so basically for each small conflict, the user would have to choose whether to discard their changes, overwrite changes in the server or perform a manual edit before sending to the server.
So if you have done things right, the user should be given a list of conflicts if there are any. These conflicts should be granular enough for the user to decide quickly. So for example, the conflict is a spelling change from, it would be easier for the user to choose from word spellings in contrast to giving the user the whole paragraph and telling him that there was a change and that they have to decide what to do, the user would then have to hunt for this small misspelling.
Other considerations: Data Validation - keep in mind that you have to perform validation after resolving conflicts since the data might have changed Text Comparison - like I said, this is a big subject. so google it! Disconnected Synchronization - I think there are a few articles out there.
Source: https://softwareengineering.stackexchange.com/questions/94634/synchronization-web-service-methodologies-or-papers