Apache Camel 从 ftp 增量提取数据定期地

发布于 2024-11-01 21:43:37 字数 652 浏览 1 评论 0原文

我对 Apache Camel 非常陌生,我正在探索如何创建一个路由,例如每 15 分钟从 ftp 提取数据,并且仅提取新的或更新的文件,因此如果某些文件被提前下载并且仍然是相同的(未更改的)ftp 加载器不应将它们加载到目标文件夹。

如有任何建议,我们将不胜感激。

更新 #1

我已经注意到我需要查看 FTP2,实际上我已经取得了进展,我想澄清的最后一件事是:consumer.dealy 定义了每个之间的延迟下载尝试,例如consumer.delay = 5s,第一次尝试ftp包含5个文件,consumer将数据拉到某个地方并等待5s,第二次尝试ftp仍然相同,camel什么也不做,之后到ftp到达另外 5 个文件,5 秒后 ftp 消费者下载这些刚刚到达的新文件,或者消费者。延迟只是让消费者在每次文件下载之间等待(文件#1 -> 5s -> 文件#2 -> 5s -> 5s)等等...)

我想实现第一个场景。

另外,我观察到,一旦某些文件被下载到目标文件夹,我的意思是从 ftp 到本地文件系统,这些文件将在后续数据加载中被忽略,即使这些文件在本地文件系统上被删除,我怎么知道骆驼再次下载已删除的文件,它如何存储已加载文件的信息?而且似乎每次都会下载所有文件,甚至在第一次数据拉取时下载文件。我是否需要编写一个过滤器来排除已下载的文件?

I am very new to Apache camel and I am exploring how to create a rout which pulls data from ftp for instance each 15 minutes and pulls only new or updated files, so if some files were downloaded early and still the same (unchanged) ftp loader should not load them to the destination folder.

Any advices are warmly appreciated.

UPDATE #1

I've already noticed that I need to look at the FTP2, and actually I've already made a progress, the last thing that I want to clarify: consumer.dealy defines delay between each download attempt, for instance consumer.delay = 5s and at the first attempt ftp contains 5 files, consumer pulls data to somewhere and waites 5s at the second attempt ftp still the same and camel just does nothing, after that to ftp arrives additional 5 files and after 5 seconds ftp consumer downloads these just arrived new files or consumer.delay just makes consumer wait between each download of file (file#1 -> 5s -> file#2 -> 5s -> etc...)

I want to achieve first scenario.

Also, I observed that once some files were downloaded to the destination folder, I mean from ftp to local file system, this files will be ignored in subsequent data loads, even if this files were deleted on the local file system, how I can tell to camel to download again deleted files, how it stores information about already loaded files? And it seems that it downloads all files each time even files were downloaded at first data pull. Do I need to write a filter to exclude already downloaded files?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蛮可爱 2024-11-08 21:43:37

有 apache Camel http://camel.apache.org/ftp.html

使用 的 FTP 组件“consumer.delay”属性用于提取每次轮询之间延迟(以毫秒为单位)的数据。

有关实施细节,请参见此处 http://architects.dzone.com/articles/apache-camel-整合

there is FTP component for apache camel http://camel.apache.org/ftp.html

use "consumer.delay" property to pull data for delay in milliseconds between each poll.

for implementation details look here http://architects.dzone.com/articles/apache-camel-integration

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文