如何同步两个java应用程序?
这是我遇到的一种情况:我有两个类似的java应用程序运行在不同的服务器上。两个应用程序都使用提供的网络服务从同一网站获取数据。但该网站当然不知道第一个应用程序已获取与第二个应用程序相同的数据。获取数据后应保存在数据库中。所以我遇到了在数据库中保存相同数据两次的问题。
如何避免数据库中出现重复条目?
大概有两种方法:
1)使用数据库端。写一些看起来像“如果唯一则插入”的内容。
2)使用服务器端。编写一些中间服务,该服务将接收来自两个数据获取器的响应并以某种方式处理它们。
我认为第二种解决方案更有效。
你能就这个话题提出一些建议吗? 您将如何实施该中间服务?如何实现服务之间的通信?如果我们使用 HashMap 来存储接收到的数据,我们如何估计系统可以处理的 HashMap 的最大大小?
Here is a situation I have encountered: I have two similair java application running on different servers. Both applications obtain data from the same website using web-service provided. But the site doesn't know of course that the first app has taken the same peace of data as the second app. After fetching data should be saved in database. So I have a problem of saving the same data two times in a database.
How can I avoid duplicate entries in my db?
Probably there are two ways:
1) use database side. write something that looks like "insert if unique".
2) use server side. write some intermediate service that will receive responses from two data fetchers and process them somehow.
I suppose second solution is more effecient.
Can you advice something on this topic?
How would you implement that intermediate service? How would implement communication between the services? If we would use the HashMaps to store received data, how can we estimate maximum size of HashMap that our system can handle?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有针对此类问题的分布式框架。
ConcurrentMap
。There are distributed frameworks for this sort of problem.
ConcurrentMap
across multiple JVM's.您真的需要同时在两台服务器上获取数据吗?在插入期间检查每个条目(如果不存在的话)可能会很昂贵。合并多个提取也可能非常耗时。并行获取有什么好处吗?考虑一次使用一个提取器。
您将面临的问题是,您必须选择哪个分布式进程应该执行数据获取并将其存储在数据库中。
这是某种领导者选举问题。
看一下Apache ZooKeeper,它是分布式协调服务。
有一个收据如何使用 ZooKeeper 实现领导者选举。
有很多框架已经实现了这个收据。我建议您使用 Netflix curator。有关 curator 领导者选举的更多详细信息,请访问 wiki。
Do you really need to fetch data at two servers simultaneously? Checking every entry during insert if not present could be expensive. Merging several fetches can be time consuming as well. Is there any benefit of fetching in parallel? Consider having one fetcher at time.
The problem you will face is that you have to choose which one of you distributed processes should perform data fetching and storing it in DB.
It is some kind of Leader Election problem.
Take a look at Apache ZooKeeper which is distributed coordination service.
There is a receipt how to implement leader election with ZooKeeper.
There are a lot of frameworks that already implemented this receipt. I'd recommend you to use Netflix curator. More details about the leader election with curator is available at wiki.