不可靠网络和低带宽的 Java ORM 策略
我正在寻找 Hibernate 作为需要在不可靠网络中工作的系统。我们需要对一个中央数据库进行读写访问,但它可以通过一个相当不稳定的 Wi-Fi 网络来访问。此外,可能会出现断电而导致应用程序无法完全关闭的情况,因此任何解决方案都必须具有能够在电源周期后继续存在的持久缓存。最后,这是一个只有适度内存和磁盘空间的嵌入式系统,因此例如对数据库进行全面复制并不是一个可行的策略。
我对 Hibernate 2 级缓存有基本的了解,我想知道是否可以使用 Ehcache 之类的东西来配置它来解决这个问题,但其主要目的似乎是性能而不是可用性,所以我不知道可能存在哪些陷阱。
我也很愿意考虑涉及复制到本地数据库的其他策略。我宁愿自己不必做太多繁重的工作来实现这一点。
寻找一些经验或可能的替代方案。
I am looking at Hibernate for a system which needs to work in an unreliable network. There is a single central database that we need read-write access to, but it is available over a pretty patchy wi-fi network. In addition, there may be power losses which do not shutdown the application cleanly, so any solution must have a persistent cache which can survive power-cycles. Lastly this is an embedded system with only modest memory, and disk space so for example doing full blown replication of the database is not a feasible strategy.
I have a basic understanding of Hibernate 2nd Level caching, and I am wondering if it is possible to configure this with something like Ehcache to solve this problem, but the main thrust of that seems to be performance not availability, so I am not aware of what the pitfalls might be.
I am also quite willing to consider other strategies which involve replication to a local database. I would rather not have to do too much of the heavy lifting myself to implement this.
Looking for some experience or possible alternatives.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
Daffodil Replicator (http://enterprise.replicator.daffodilsw.com/index.html) 允许在 JDBC 源之间进行复制。它支持双向更新、合并和冲突解决以及部分副本。
这可用于将主数据库与本地(部分)副本同步。您可以使用 hibernate 与本地副本数据库通信,并在该进程之外完成其他所有操作。
The Daffodil Replicator (http://enterprise.replicator.daffodilsw.com/index.html) allows replication between JDBC sources. It support bidirectional updates, merging and conflict resolution and partial replicas.
This can be used to synchronize the main database with a local (partial) replica. You can use hibernate to talk to the local replica database and have everything else done outside of that process.
Hibernate(和二级缓存)实际上并不是为此而设计的。我的猜测是,您可能最好使用小型嵌入式 Java RDBMS(例如 H2 或 HSQLDB)作为本地临时队列(在可用的最持久模式下),然后与后台线程进行同步。然后,您可以提供一个连接到该后台线程的同步微调器 UI,为用户提供某种程度的反馈。
顺便说一句,Hibernate 转储到嵌入式环境中有点臃肿。您可能想考虑使用 myBatis。
Hibernate (and the second level cache) are really not designed for this. My guess is that you would probably be best off using a small-scale embedded Java RDBMS (e.g. H2 or HSQLDB) as your local temporary queue (in the most durable mode available) and then do the sync with a background thread. You could then provide a sync spinner UI hooked up to that background thread to provide some degree of feedback for the user.
Incidentally, Hibernate is a bit fat to dump into an embedded environment. You might want to consider myBatis instead.
如果这只是两台机器之间的零星连接的情况,我建议保留一个可以回放的事务日志,并将每个条目标记为已处理。不过,有限的内存可能会让这变得困难。
不过,也许您可以存储压缩的事务日志。
If it were just a case of sporadic connection between the two machines, I would recommend keeping a transaction log that can be played back and each entry marked as processed. The limited memory may make that difficult, though.
Maybe you can store the transaction log compressed, though.
“此外,可能会出现断电而导致应用程序无法完全关闭的情况,因此任何解决方案都必须具有能够在电源循环后继续存在的持久缓存。”
您心中已经有了 Hibernate 2 级缓存的解决方案。但你没有说真正的要求是什么。您的网络不可靠。没关系,你的电源不可靠。那也可以。现在您希望达到什么水平的服务?什么是可以接受的,什么是不可以接受的?
数据丢失是否可以接受?你能接受多少?您接受什么风险?
更明确地说,假设您有数据库的本地副本或至少是数据库的一部分。假设您知道如何对本地进行的修改进行排队/保存。假设您将这些修改存储在硬盘上,以便在断电时保持安全。假设当连接再次可用时,您可以将更改与主数据库合并。
这已经是很多假设了。好的,但是如果一个硬盘在断电后出现故障怎么办?您知道硬盘不喜欢断电,并且很容易在断电时损坏甚至损坏吗?
因此,您建立了 RAID,并添加了不间断电源。那很好。您从操作系统检测电源故障事件。完成当前交易并正确关闭。 RAID 可保护您免受磁盘故障的影响。
好的,但是如果整个计算机停止运行会发生什么?如果发生火灾怎么办?还是水害?所有磁盘都将被管理,数据不可恢复,未与中央数据库同步的数据将丢失。可以接受吗?
即使wifi打开,电源也能正常工作......中央数据库的可靠性到底如何?您有定期备份吗?或者集群解决方案?您确定您的中央数据库可靠吗?
从数据库的角度来看,很容易使用集群或备份并使用事务来保证数据一致性。您仍然可能丢失数据(如果特别不使用集群),但您应该能够恢复到例如上次备份。
但是,如果您想脱机工作(数据库不可用),并且您不是唯一可以修改数据库的人,则会发生冲突。这不再是缓存、休眠或任何技术问题。
这是功能问题。当离线发生多个修改并且必须合并时该怎么办?什么是可以接受的?什么不是。这可能是在重新连接时,应用最新的更改,旧的更改被丢弃。或者检测到潜在的冲突并提示用户处理它们。您可以尝试应用排队更改并应用所有更改...
我倾向于认为您可以提供“离线模式”,但您的用户必须知道他们处于离线状态,并且在进行更改时应该收到通知永久保存在中央数据库中,并最终解决冲突。但这是我的观点。
"In addition, there may be power losses which do not shutdown the application cleanly, so any solution must have a persistent cache which can survive power-cycles."
You already have a solution in your mind with Hibernate level 2 cache. But you didn't say what are the real requirements. You have an unrealiable network. That's OK, you have unrealiable power supply. That's Ok too. Now what level of service do you want to achieve ? What is acceptable or not ?
Is data loss acceptable ? How much could you accept ? What risk do you accept ?
To be more explicit, let say you have a local replica of the database or at least part of it. Let say you know how to queue/save modification made locally. Let say you store theses modification on a harddrive so to be safe in case of power failure. Let say you are able to merge changes with the main database when connection is avaialable again.
That's already a lot of assumptions. Ok but what happens if one harddrive fail after a powerfailure ? You know that harddrive don't like power failure and tend to be corrupted on power failure or even can be damaged ?
So you put on a RAID, and add an uninterruptible power supply. That's nice. Your detect power failure event from the OS. Finish your current transaction and correctly shutdown. You RAID protect you from a disk failure.
Ok, but what happens if the whole computer stop functionning ? What happens in case of fire ? Or water damage ? All disk will be managed, data unrecoverable and what is not synchronized with the central database is lost. Is it acceptable or not ?
Even when the wifi is on, the power supply work perfectly... What is the reliability of the central database anyway ? Do you have regular backups ? Or a clustering solution ? Are you sure your central database is reliable anyway ?
From a Database point of view, it is easy to use a cluster or backup and use transactions to ensure dataconsistency. You can still loose data (if not using a cluster in particular), but you should be able to recover up to the last backup for exemple.
But if you want to work offline (with database not available), and you are not the only one that can modify the database, conflicts WILL occurs. This is no longer a cache, hibernate or anything technical problem.
This is functional problem. What to do when several modifications occurs offline and you have to merge ? What is acceptable ? What is not. This might be that on reconnect, the most recent change apply, older changes are discarded. Or ptential conflicts are detected and prompts user to deal with them. You can try to apply queued change and apply all of them...
I would tend to consider that you can offer an "offline mode" but your users must be aware they are offline, and should have a notification when the change are being made permanent on central database with eventual conflict resolution. But that my point of view.
您不能指望使用 hibernate 和数据库之间的网络来取得成功。
我建议您定义一组高级原子操作,然后为它们定义一组(例如)restful 服务。或者,如果您愿意,您可以使用soap 并查看 WS-* 选项以获得可靠的消息传递,以处理重试和所有其他混乱的细节。
或者,您可以研究跨链接的 cassandra 之类的东西是否比 SQL 或其他在复制方面更重要的东西工作得更好。
You can't expect to succeed with a network like that between hibernate and the database.
I recommend that you define a set of high-level atomic operations, and then define a set of (e.g.) restful services for them. Or, if you like, you can use soap and look into the WS-* options for reliable messaging to take care of retries and all the other messy details.
Or, you could investigate whether something like cassandra across the link would work better than SQL, or something else big on replication.
如何在持久/持久消息队列上排队数据库操作,并让某些消息传递中间件处理网络问题?
根据您的操作方式,可能会出现一致性问题(嗯,我猜“异常”是正确的词),但如果您的网络不可靠并且仍然想要良好的性能,那么解决宽松的一致性可能是正确的选择。
我会犹豫是否使用 EhCache 等。它们不是为此而设计的,因此您可能必须“拉伸”框架。另一方面,消息队列有专为此类场景设计的解决方案。
How about queuing up db operations on a durable/persistent message queue, and let some messaging middleware handle the network problem?
Depending on how you do it, consistency problems (well, "anomaly" is the right word I guess) can arise, but if you have unreliable network and still want decent performance, then settling for relaxed consistency could be the way to go.
I would be hesitant to use EhCache etc. They were not designed for this and hence you might have to "stretch" the framework. Message queues on the other hand have solutions that were designed for such scenarios.