在 JavaEE 应用程序中处理 PDF
JavaEE应用程序必须处理大量PDF文档。 处理的类型并不重要,但为了清楚起见,我们可以说它包括提取文本、将页面转换为图像、在第一页上标记 ID、打印、将它们保存到数据库。
传入的 PDF 来自各种各样的供应商,因此对它们的控制很少(如果有的话)。
所有操作都在后台进行,即有计时器轮询入站通道、检索文档并将其发送到处理。没有用户交互。
两个顶级 java 库用于管理 PDF。由于PDF规范的扩展以及PDF生成工具的广泛传播,它们不可能涵盖所有可能的缺陷,因此有时它们无法操作甚至打开文档。
非常不幸的是,它们有时会失败而不引发异常,而是阻塞在无限的子方法中。这很关键,因为轮询计时器会阻塞,不再处理更多文档,管理员意识到出现问题时为时已晚,而且更糟糕的是,必须重新启动整个应用程序服务器,这在生产环境中并不容易/不公平。
那么驱动库的 EJB 如何知道调用被阻止并停止事务呢?
我可以启动一个专用线程(不违反 JavaEE 规范)并设置超时等待。当检查标志或达到超时时,等待结束。在后一种情况下,线程被视为被阻止,PDF 可以被标记为无效,并且可以发送电子邮件警报等。
有人看到任何替代且可行的解决方案吗?
谢谢
A JavaEE application must process a large number of PDF documents.
The kind of processing is not important but for the sake of clarity we can say that it includes extracting text, converting pages into images, stamping IDs on the first page, printing, saving them to DB.
Incoming PDFs come from a huge variety of suppliers so there is very little control over them (if any).
All operations take place in the background, i.e. there are timers polling inbound channels, retrieving the documents and sending them to processing. No user interaction.
Two top-level java libraries are used to manage PDFs. Due to the PDF specification extension and the extreme spread among PDF generating tools, they can't possibly cover every possibile flaw, so sometimes they fail to manipulate or even open a document.
Quite unfortunately, they sometimes fail without raising an exception but blocking in an infinite sub-method. This is critical because the polling timer blocks, no more documents are processed, administrators realize that something is wrong too late and -what's worse- the whole application server must be restarted, which is not easy/fair in a production environment.
So how could the EJB that drives the library understand that the call is blocked and stop the transaction ?
I could start a dedicated thread (without breaking the JavaEE specification) and set a wait with a timeout. The wait ends when either a flag is checked or the timeout is reached. In the latter case the thread is considered blocked, the PDF can be marked as invalid and, for instance, an email alert can be sent.
Does anyone see any alternative -and feasible- solutions ?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不确定我是否完全理解您所描述的内容,但恕我直言,您可以使用 MDB 进行异步处理(而不是创建单独的线程),也可以在事务上下文中运行 EJB 方法并为执行以下操作的方法设置事务超时你所描述的工作。如果事务超时,您将得到您想要的异常。
I'm not sure I fully get what you have described, but IMHO you can either use MDB's for asynchronous processing (instead of creating separate threads) or run the EJB methods in a transactional context and set the transaction timeout for the method which does the job you described. If the transaction times out you will get an exception you wanted.