两个不同的 Java 应用程序共享同一个数据库
在我的网络应用程序中,我有一个部分需要不断地抓取网络、处理这些数据并将其呈现给用户。所以我想知道将其分成两个单独的应用程序是否是一种好方法,其中一个应用程序将进行爬行、数据处理并将数据存储在数据库中。另一个应用程序是一个 Web 应用程序(安装在某个 Web 服务器上),它将向用户呈现数据库中的数据并允许他与数据进行某种交互。
我认为我需要这种拆分的原因是,如果我对我的网络应用程序进行某些更改(例如添加新功能、更改界面等),我不希望爬行被中断。
我的应用程序堆栈是 Tapestry(Web 层)、Spring、Hibernate(通过 MySQL)以及我自己独立于其他人的爬虫实现。
仅仅使用同一个数据库就可以完成集成吗?这可能会导致同时从两个应用程序访问数据库时出现问题。或者可以在 Hibernate 级别上完成集成,以便两个应用程序可以使用相同的 Hibernate 会话吗?但是一个 JVM 实例中的应用程序可以访问另一个 JVM 实例中的对象吗?
对于有关此事的任何建议,我将不胜感激。
更新
用户(从网络应用程序的界面)将输入 URL 供爬虫程序解析。爬虫应用程序只会读取带有 Web 应用程序填充的 URL 的表。反之亦然,爬虫处理的数据只会呈现在用户界面上。所以,我认为我不应该担心任何类型的锁定,对吧?
谢谢, 尼古拉
In my web application I have a part which needs to continuously crawl the Web, process those data and present it to a user. So I was wondering if it is a good approach to split it up into two separate applications where one would do the crawling, data processing and store the data in the database. And the other app would be a web application (mounted on some web server) which would present to a user the data from the database and allow him a certain interaction with the data.
The reason I think I need this split is because if I make certain changes to my web app (like adding new functionalities, change the interface etc.) I wouldn't like the crawling to be interrupted.
My application stack is Tapestry (web layer), Spring, Hibernate (over MySQL) and my own implementation of the crawler independent from the others.
Is it good for the integration to be done just by using the same database? This might cause an issue with accessing the database from the both applications at the same time. Or can the integration be done on the Hibernate level, so both applications could use the same Hibernate session? But can the app from one JVM instance access the object from another JVM instance?
I would be grateful for any suggestions regarding this matter.
UPDATE
The user (from web app's interface) would enter the URLs for crawler to parse. The crawler app would just read the tables with URLs the web app populates. And vice versa, the data processed by the crawler would just be presented on the user interface. So, I think I shouldn't concern about any kind locking, right?
Thanks,
Nikola
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我肯定会像你计划的那样将它们分开。网络爬行更像是一个“批处理”过程,而不是请求驱动的网络应用程序。 Web 抓取应用程序将在其自己的 JVM 中运行,而您的 Web 应用程序将在 servlet/Java EE 容器中运行。
爬虫程序运行的频率是多少,还是一个持续运行的进程?您可能需要根据您的要求考虑频率。
来自网络应用程序的用户是否会更新爬网程序将数据发布到的相同表?在这种情况下,您需要采取预防措施,否则可能会出现潜在的僵局。如果您希望 Web 应用程序根据表中的新插入自动刷新数据,那么您可以创建消息驱动的 Bean(使用 JMS) 从爬虫应用程序异步通知 Web 应用程序。当新的数据插入消息到达时,您可以在页面上提交表单或使用 ajax 更新页面本身的数据。
Web 应用程序应使用连接池,批处理应用程序可以使用 DBCP 或 C3P0。我不确定在这种情况下尝试共享数据库会话是否会带来很多好处。
这样,您就可以实现两个应用程序之间的集成,同时不会因等待其他应用程序而减慢彼此的速度。
哈!
I would definitely keep them separated like you are planning. The web crawling is more a "batch" process than a request driven web application. The web crawling app will run in its own JVM and your web app will be running in a servlet/Java EE container.
How often will the crawler run or is it a continuously running process? You may want to consider the frequency based on your requirements.
Will the users from web app be updating the same tables that the crawler will post data to? In that case you will need to take precaution otherwise a potential deadlock may arise. If you want your web app to auto refresh data based on new inserts in the tables then you can create a message driven bean (using JMS) to asynchronously notify the web app from the crawler app. When a new data insert message arrives you can either do a form submit on your page or use ajax to update the data on the page itself.
The web app should use connection pooling and the batch app could use DBCP or C3P0. I am not sure you gain much benefit by trying to share the database sessions in this scenario.
This way you have the integration between the two apps while not slowing down each other waiting on other to process.
HTH!
你是对的,将应用程序分成两部分对于你的情况来说可能是合理的。
分成两个应用程序的缺点 -
您已经在代码中指定了优点。
You are right, splitting the application into two could be reasonable in your case.
Disadvantages of separating into two applications -
Advantages you have already specified in your code.