目录轮询的最佳实践
我必须进行批处理来自动化业务流程。我必须定期轮询目录以检测新文件并进行处理。当处理旧文件时,新文件可以进来。目前,我使用quartz调度程序和线程同步来确保只有一个线程可以处理文件。
部分代码为:
application-context.xml
<bean id="methodInvokingJob"
class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"><br/>
<property name="targetObject" ref="documentProcessor" /><br/>
<property name="targetMethod" value="processDocuments" /><br/>
</bean>
DocumentProcessor
……
public void processDocuments() {
LOG.info(Thread.currentThread().getName() + " attempt to run.");
if (!processing) {
synchronized (this) {
try {
processing = true;
LOG.info(Thread.currentThread().getName() + " is processing");
List<String> xmlDocuments = documentManager.getFileNamesFromFolder(incomingFolderPath);
// loop over the files and processed unlock files.
for (String xmlDocument : xmlDocuments) {
processDocument(xmlDocument);
}
}
finally {
processing = false;
}
}
}
}
对于当前的代码,我必须在一个线程正在处理时阻止其他线程处理文件。这是个好主意吗?或者我们支持多线程处理。在这种情况下,我如何知道哪些文件正在处理以及哪些文件刚刚到达?任何想法都非常感激。
I have to do batch processing to automate business process. I have to poll directory at regular interval to detect new files and do processing. While old files is being processed, new files can come in. For now, I use quartz scheduler and thread synchronization to ensure that only one thread can process files.
Part of the code are:
application-context.xml
<bean id="methodInvokingJob"
class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"><br/>
<property name="targetObject" ref="documentProcessor" /><br/>
<property name="targetMethod" value="processDocuments" /><br/>
</bean>
DocumentProcessor
.....
public void processDocuments() {
LOG.info(Thread.currentThread().getName() + " attempt to run.");
if (!processing) {
synchronized (this) {
try {
processing = true;
LOG.info(Thread.currentThread().getName() + " is processing");
List<String> xmlDocuments = documentManager.getFileNamesFromFolder(incomingFolderPath);
// loop over the files and processed unlock files.
for (String xmlDocument : xmlDocuments) {
processDocument(xmlDocument);
}
}
finally {
processing = false;
}
}
}
}
For the current code, I have to prevent other thread to process files when one thread is processing. Is that a good idea ? or we support multi-threaded processing. In that case how can I know which files is being process and which files has just arrived ? Any idea is really appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将使用以下部分构建它:
Castle Transactions with TxF
FileSystemWatcher Java版本
TransactionScope(没有java版本,除非你经常破解它)
无锁队列 *(讨论 Java 与 .Net 性能的论文,也许可以获得源代码来自 Java ) Java 基于锁的队列
这样:
当有新文件时,文件系统观察程序会检测到它(记住放置正确的标志,处理错误条件并设置 Enbled <- True 并注意双精度),放置文件路径在队列中。
你有一个应用程序线程,n 个工作线程。如果这是唯一的应用程序,它们会在队列上旋转等待,TryDequeue,否则它们会在监视器上阻塞 while(!Monitor.Enter(has_items)) ;
当工作线程通过出队操作获得路径时,它开始对其进行工作,现在没有其他线程可以对其进行工作。如果存在双倍输出(取决于您的设置),则您可以在写入输出文件时使用文件事务。如果提交操作失败,则您知道另一个线程已经写入输出文件,并恢复轮询队列。
I would build it with these parts:
Castle Transactions with TxF
FileSystemWatcher JavaVersion
TransactionScope (no java version unless you hack it a lot)
A lock-free queue * (Paper discussing perf Java vs .Net, might be able to get source from them for Java) Java lock-based queues
Such that:
When there's a new file, the file system watcher detects it (remember to put the correct flags, handle the error condition and set Enbled <- True and watch out for doubles), puts the file path in the queue.
You have an application thread, n worker threads. If this is the only app, they spin-wait on the queue, TryDequeue, otherwise they block on a monitor while(!Monitor.Enter(has_items)) ;
When a worker threads get a path through the de-queue operation, it starts working on it, and now no other thread can work on it. If there are doubles of output (depending on your setup), you can then use a file transaction as you are writing the output file. If the Commit operation fails, then you know another thread has already written the output file, and resume polling the queue.
我会执行以下操作:
一个线程获取您的文件名并将它们添加到同步队列中。
多线程执行实际读取:从同步队列获取一个项目并处理它。
要检查文件是否已使用,您只需尝试重命名/移动它即可。
I'd do the following:
One thread that gets your filenames and adds them to a synchronized queue.
Multiple threads to do the actual reading: get an item from the synced queue and process it.
To check if a file is used you can simply try to rename/move it.