Java 处理链中的作业排队
我目前正在java中设计一个关联引擎,它从pdf文件中提取数据并关联(在必要时发出警报)它从关系数据库中构建的数据。
系统重点关注 pdf 文件的处理,包括:
执行 pdf 自定义提取的组件。
一个将有时无序的不干净数据解析为所需数据结构的组件
一个标准化组件,它将规范化比较值
以及一个与数据库交互的组件(其中提取的数据将与其余数据一起插入)
组件应该可以在其他处理链中重用,但它们最初都将在同一系统上运行。
我认为在组件之间进行某种缓冲是明智的,使用 JMS 队列是否明智,或者这会使事情变得过于复杂?我一直在尝试一个简单的 linkedblockingqueue 对象,但该对象必须在组件之间传递,因此它需要一个主组件来驱动我不确定是否需要的所有内容,是否有解决此问题的标准方法?
I am currently designing a correlation engine in java which is extracting data from pdf files and correlating (raising alerts where necessary) it structured data from a relational database.
Focusing on the processing of the pdf files the system consists of:
A component which is performing the custom extraction from the pdf.
A component which parses the sometimes unordered unclean data into the required data structures
A normalisation component which will normalises the values for comparison
And a component which interfaces with the db (where the extracted data will be inserted with the rest of the data)
The components should be reusable in other processing chains but they will all run on the same system initially.
I think it's wise to have some sort of buffering between components, is it wise to be using JMS Queueing or would this over complicate matters? I have been experimenting with a simple linkedblockingqueue object but this object has to be passed between components so it requires a master components which drives everything which i am not sure is desirable, is there a standard way of approaching this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
除非您有其他要求,否则我会使用链式调用。
如果您想要多个线程,我将使用 ExecutorService 线程池在不同的线程中处理每个文件。
I would use chained calls unless you have additional requirements.
If you want multiple threads, I would process each file in a different thread using an ExecutorService thread pool.