有时离线应用程序架构问题
我有一个针对 sqlserver DB 运行的 n 层 winform 客户端服务器应用程序。 我希望它有时能够“离线”运行(未连接到数据库)并在重新连接时重新协调对主数据库的更改。 现在,我需要做出艰难的架构决策:我应该使用数据库复制还是使用队列/脚本等自己管理它。我的应用程序非常复杂 - 我使用包含自动增量键和表之间的外来键约束的表的数据库。 我的部分数据没有像图片和文档那样嵌入到数据库中。 我非常想听听您的意见和过去的经验! 谢谢,阿迪
I have a n-tier winform client server application running against sqlserver DB. I want it to be able sometimes to run "offline" (not connected to the DB) and on reconnect, reconsile the changes to the main DB. Now, this i have tough architecture decision to make: should i use database replication or manage it myself using queues/scripts etc. My application is quit compicated - i use database with tables containing autoincrement keys and forien keys constraints between tables. Part of my data is not embedded in the DB like pictures and documents. I would like to hear your opinion and past experience very much! Thanks, Adi
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
(免责声明:我假设您已经考虑使用 .NET 数据集并对其打折扣,因为它们旨在帮助解决您所描述的问题领域。 )
我曾经在一家公司工作,该公司为其全国连锁店开发了销售点系统。 主数据库存储在总部,而每个商店都有自己的该数据库的精简版本存储在该站点本地。 实际上,每个商店一直处于离线状态,因此情况并不完全是您所描述的情况,但是我们必须处理一些我想您需要处理的同步/复制问题。
我们的数据通信每天晚上进行:商店将在预定时间连接到总部,上传数据更改包,并下载类似的数据更改包,并将其应用于该商店的本地数据库。 然后,我们在两个站点(总部和商店)都有所谓的“数据同步引擎”,它可以处理这些数据包,将更改(插入/更新/删除)折叠回相关数据库。
当您执行这样的基本数据复制时,正如 Sergio 提到的那样,存在许多潜在的陷阱。 一是身份,即如何派生唯一标识表行的主键。 另一个是版本控制,以及如何处理同一行的不同版本之间的冲突。
在我们的例子中,我们通过使用 GUID 作为主键而不是使用自动增量列,让事情变得简单(更!)。 使用 GUID 并非没有问题,但在我们的例子中,这意味着我们可以将主键分配给新的数据行,而不必担心其他人使用它。
我对我们如何处理版本控制问题有点模糊(已经有几年了!),但从记忆中我认为每个表行上都有两个时间戳:其中一个记录了该行更新的日期/时间总公司; 另一个是在商店更新时。 每行还有两个“版本号”,表示该行在总部和商店的版本。 数据协调涉及将这些时间戳和版本号相互比较,最近的更改“获胜”(当然假设另一方没有更改该行)。
正如 Sergio 指出的那样,最大的问题是处理数据协调冲突。 在我们的案例中,当商店和总部在同一天更改相同的数据项时就会发生这种情况。 我们解决这个问题的方法是总是在商店端失败更改,并在总部编写自定义数据协调应用程序,其中涉及用户直观地比较和合并数据项的两个冲突版本。 理论上,我认为您可以使用一些自定义处理规则自动合并不同版本,但您需要权衡开发此类内容的成本与发生冲突的可能性。 据我记忆,尽管有大量商店(几百个)对同一组数据进行更改,但这对我们的系统来说从来没有被证明是一个大问题。 当然是YMMV。
(Disclaimer: I'm assuming that you've already considered using .NET DataSets and discounted them, given that they're designed to help with just the problem domain that you're describing.)
I used to work for a company that developed a point-of-sale system for its nationwide chain of shops. The master database was stored at head office, while each shop had its own cut-down version of this database stored locally at that site. Effectively, each shop was off-line all the time, so it's not quite the situation that you're describing, however we had to deal with some of the synchronisation/replication issues that I imagine you will need to deal with.
Our data communications happened each night: shops would connect to head office at a pre-determined time, upload a package of data changes, and download a similar package of data changes that were to be applied to that shop's local database. We then had what you might call 'data sync engines' at both sites (head office & shops) which would process these data packets, folding the changes (inserts/updates/deletions) back into the relevant database.
When you perform basic data replication like this, there are a number of potential pitfalls as Sergio has mentioned. One is identity, namely how you derive a primary key that uniquely identifies a table row. Another is versioning, and how you handle conflicts between different versions of the same row.
In our case, we made things easy(-ier!) for ourselves by using GUIDs as primary keys rather than using auto-increment columns. Using GUIDs is not without its issues, but in our case it meant that we could assign a primary key to a new data row and not have to worry about anyone else using it.
I'm a bit hazy on how we handled the versioning problem (it's been a few years!), but from memory I think we had two timestamps on each table row: one of these recorded the date/time when the row was updated at head office; the other, when it was updated at the shop. Each row also had two 'version numbers' that indicated the version of the row at head office and at the shop. Data reconciliation involved comparing these timestamps and version numbers against each other, with the most recent change 'winning' (assuming the other party hadn't changed the row of course).
As Sergio points out, your biggest problem will be handling data reconciliation conflicts. In our case, this occurred when a shop and head office changed the same data item on the same day. We worked around this by always failing the change at the shop end, and writing a custom data reconciliation application at head office, which involved a user visually comparing and merging two conflicting versions of a data item. In theory I suppose you could automate the merging of different versions using some custom processing rules, but you would need to weigh-up the cost of developing something like that versus the likelihood of conflicts arising. From memory, this never proved to be that big a problem for our system, despite there being a large number of shops (a few hundred) making changes to the same set of data. YMMV of course.
我以前从未做过类似的事情,但在我看来,如果你这样做,你可能会遇到严重的问题......
从技术上讲,我认为实施起来并不那么困难。 基本上,您必须在每个客户端上设置数据库的副本,并在每次客户端连接到服务器时与服务器同步,但我想您已经做到了这一点。
我会在客户端的每个表上都有一个位列和一个日期戳,这样我就可以检查哪些记录已离线更改。 在服务器端,记录对象的最后更新的日期戳列就可以解决问题。
至于具有自动增量的主键,我会松开它们,因为您需要自己设置它们以防止使用相同的键创建两个记录(您可能需要在同步时更改它们)。
这是简单的部分...现在事情会变得混乱...你需要考虑到这会给你带来很多头痛...各种或不希望的事件都会发生,一些例子:
- 两个用户离线更改同一条记录。
- 一个用户在线更改记录,另一个用户离线更改记录
- 一个用户在线删除一条记录,而另一个用户则离线处理
该记录潜在问题的清单不胜枚举,在开始解决这些问题之前,您必须枚举每一个问题,并与客户一起记录他们期望系统如何处理每种情况,否则当他们丢失数据时(无论你做什么都会发生这种情况),这将是你的错而不是他们的错。
我建议您为数据库中的每个可以离线更改的表构建一个版本控制系统。 用户会弄乱他们的数据,并且执行回滚对他们来说会很好。
I've never done anything like that before, but it looks to me that if you go that way you might get into serious problems...
Technically I don't think that it's really that hard to implement. Basically you will have to set a copy of the database on each client and synchronise with the server every time the client connects to the server, but I guess you already got that far.
I would had a bit column and a datestamp on each table at the client so I could check which records have been changed off-line. On the server side a datestamp column to record the last update to the object will do the trick.
As for the primary keys with auto increment, I would loose them because you will need to set them yourself to prevent creating two records with the same key(you might need to change them on synchronising).
This is the easy part... Now is where things will get messy... You need to take into account that this will bring you a lot of headaches... All sorts or not desired events will happen, some examples:
- Two users change the same record off-line.
- One user changes a record on-line and another off-line
- One user deletes a record on-line while another is working it off-line
The list of potential problems go on and on, before you start addressing them you must enumerate every single one and document with your clients how they expect the system to handle each case, otherwise when they loose data (and this will happen no matter what you do) it will be your fault instead of theirs.
I recommend that you build a versioning system for every table in your database that can be changed off-line. Users will mess their data and it will be nice for them to perform roll-backs.
我已经在不同的地方做过几次了(请参阅下面 Steve Rands 的回答),我强烈建议您不要使用正常的复制 - 特别是如果涉及多个数据库的话。
我这么说的原因是,根据我的经验,复制不够智能,无法处理当您将远程站点重新联机(或当您决定向整个网络添加新站点时)可能出现的问题。
如果您只有 2 或 3 个不同的数据库,但如果您谈论的是许多不同的位置,这些位置可以随时在线/离线,并且可以随时添加(或删除或修改)信息,那么复制对于这种事情来说是很好的。在这些位置中,你很快就会让某些东西陷入混乱状态。 从技术上讲,这并不是一件非常令人满意的事情,但是您总是能够想到特殊情况,在这些情况下您不希望复制按照设计去做它想做的事情。
如果您只处理 2 个数据库,那么显然复制问题会变得更加简单,您可能会发现可以使用合并复制来完成这项工作(尽管您必须注意数据库设计)。
我刚刚买了一本二手的 Apress SQL Server 2005 复制圣经(不是在办公室,所以没有作者,但这是一本值得推荐的巨著)——在前几章中,我读到了开始意识到,如果您确实在两端(或更多端)更改数据,那么复制并不是灵丹妙药。 :-)
I've done this several times now at different places (see Steve Rands' answer below) and I would strongly urge you NOT to use normal replication - especially if there are going to be several databases involved.
The reason I say this is that in my experience replication isn't smart enough to deal with the problems that can arise when you bring a remote site back online (or when you decide to add a new site to the overall network).
Replication is fine for this kind of thing if you only have 2 or 3 different databases but if you are talking about lots of different locations that can be online/offline at any time, and information can be added (or deleted or amended) at any of those locations, it won't take you long to get something into a confused state. It's not a very technically satisfying thing to say, but you will always be able to think of special cases where you wouldn't want the replication to do what it will, by design, want to do.
If you're only dealing with 2 databases then obviously the replication problems become much more straightforward and you will probably find that you can use merge replication for the job (though you have to watch your database design).
I've just bought a second-hand copy of the Apress SQL Server 2005 Replication Bible (not in the office so don't have the author to hand but it's a well-recommended, monster tome) - within the first couple of chapters I began to realise that replication is not a magic bullet solution if you're really changing data at two (or more) ends. :-)
这通常称为公文包模型,您可以使用
适用于 ADO.NET 的 Microsoft 同步服务
This is usualy called a briefcase model, you can use the
Microsoft Synchronization Services for ADO.NET
您应该查看 Microsoft 同步框架。
从头开始自己构建偶尔离线的解决方案是一项复杂的任务。 在我的职业生涯中,我见过许多优秀的开发团队把事情搞砸了。 我并不是说您自己构建它会遇到问题,但为什么不使用已经存在的东西呢? 如果您发现它不能满足您的需求,您可能会更好地了解如何编写自己的解决方案。
代价是您必须学习同步框架,但有些示例您可能可以立即利用。
You should look at the Microsoft Sync Framework.
Building an occasionally offline solution yourself from scratch is a complex undertaking. In my career I have seen many good development teams mess it up. I'm not saying that you would have problems building it yourself, but why not use something that already exists? And if you find it doesn't meet your needs, you will probably have a better understanding of how to code your own solution.
The tradeoff is that you would have to learn the Synch Framework, but there are samples that you could probably leverage immediately.