检测传输中的文件?

发布于 2024-07-10 14:15:04 字数 649 浏览 8 评论 0 原文

我正在编写一个应用程序,通过每隔几秒钟轮询目录来监视目录中的新输入文件。 新文件通常可能有几兆字节,因此需要一些时间才能完全到达输入目录(例如:从远程共享复制)。

有没有一种简单的方法来检测文件当前是否正在复制过程中? 理想情况下,任何方法都与平台和文件系统无关,但如果做不到这一点,不同的平台可能需要特定的策略。

我已经考虑过将两个目录列表分开几秒钟并比较文件大小,但这引入了时间/可靠性权衡,我的上级对此不满意,除非别无选择。

对于背景,应用程序被编写为一组 Matlab M 文件,因此恐怕没有 JRE/CLR 技巧...


编辑: 文件通过直接移动直接到达输入/从网络驱动器或本地文件系统上的其他位置进行复制操作。 此复制操作可能由人类用户而不是另一个应用程序发起。

因此,很难让文件提供者承担添加控制文件或使用中间暂存区域的责任...


结论:似乎没有简单的方法可以做到这一点,所以我已经解决了皮带和大括号的方法 - 文件已准备好进行处理,如果:

  • 其大小在特定时间段内没有改变,并且
  • 可以以读取方式打开文件-仅模式(某些复制进程会锁定文件)。

感谢大家的回复!

I'm writing an application that monitors a directory for new input files by polling the directory every few seconds. New files may often be several megabytes, and so take some time to fully arrive in the input directory (eg: on copy from a remote share).

Is there a simple way to detect whether a file is currently in the process of being copied? Ideally any method would be platform and filesystem agnostic, but failing that specific strategies might be required for different platforms.

I've already considered taking two directory listings separaetd by a few seconds and comparing file sizes, but this introduces a time/reliability trade-off that my superiors aren't happy with unless there is no alternative.

For background, the application is being written as a set of Matlab M-files, so no JRE/CLR tricks I'm afraid...


Edit: files are arriving in the input directly by straight move/copy operation, either from a network drive or from another location on a local filesystem. This copy operation will probably be initiated by a human user rather than another application.

As a result, it's pretty difficult to place any responsibility on the file provider to add control files or use an intermediate staging area...


Conclusion: it seems like there's no easy way to do this, so I've settled for a belt-and-braces approach - a file is ready for processing if:

  • its size doesn't change in a certain period of time, and
  • it's possible to open the file in read-only mode (some copying processes place a lock on the file).

Thanks to everyone for their responses!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

ぽ尐不点ル 2024-07-17 14:15:05

你的操作系统是什么。 在 unix 中,您可以使用“lsof”实用程序来确定用户是否打开文件进行写入。 显然,MS Windows Process Explorer 中的某个地方也有相同的功能。

或者,您可以尝试对文件进行独占打开,然后避免失败。 但这可能有点不可靠,而且很容易踩到自己的脚趾。

What is your OS. In unix you can use the "lsof" utility to determine if a user has the file open for write. Apparently somewhere in the MS Windows Process Explorer there is the same functionality.

Alternativly you could just try an exclusive open on the file and bail out of this fails. But this can be a little unreliable and its easy to tread on your own toes.

错々过的事 2024-07-17 14:15:04

最安全的方法是让将文件放入目录中的应用程序首先将它们放入不同的临时目录中,然后将它们移动到真正的目录中(即使使用 FTP 或文件共享,这也应该是原子操作) 。 您还可以使用命名约定在一个目录中获得相同的结果。

编辑:
它实际上取决于文件系统,取决于其复制功能是否具有“已​​完成文件”的概念。 我不太了解 SMB 协议,但如果它有这个概念,您可以编写一个公开 SMB 接口(或修补 Samba)和 API 的应用程序,以获取已完成文件副本的通知。 不过可能还有很多工作要做。

The safest method is to have the application(s) that put files in the directory first put them in a different, temporary directory, and then move them to the real one (which should be an atomic operation even when using FTP or file shares). You could also use naming conventions to achieve the same result within one directory.

Edit:
It really depends on the filesystem, on whether its copy functionality even has the concept of a "completed file". I don't know the SMB protocol well, but if it has that concept, you could write an app that exposes an SMB interface (or patch Samba) and an API to get notified for completed file copies. Probably a lot of work though.

刘备忘录 2024-07-17 14:15:04

这是一个像山一样古老的中间件问题,简短的答案是:不。

这两个“解决方案”将责任放在文件上传者身上:(1) 将文件上传到暂存目录,然后将其移动到目标目录 (2) 上传文件,然后创建/上传一个“就绪”文件指示内容文件的状态。

第一个更好,但两者都不优雅。 事实上,存在比文件系统更好的通信媒体。 考虑使用一些仅涉及推送或拉取(而不是像文件系统那样两者都涉及)的 IPC,例如 HTTP POST、JMS 或 MSMQ 队列等。此外,这也可以是同步的,允许进程接收文件承认内容,甚至检查其价值,并向客户提供收据——这是通往不可否认的正路。 遵循这一点,您将永远不会因为文件是否已传送到服务器进行处理而争论。

M。

This is a middleware problem as old as the hills, and the short answer is: no.

The two 'solutions' put the onus on the file-uploader: (1) upload the file in a staging directory and then move it into the destination directory (2) upload the file, and then create/upload a 'ready' file that indicates the state of the content file.

The 1st one is the better, but both are inelegant. The truth is that better communication media exist than the filesystem. Consider using some IPC that involves only a push or a pull (and not both, as does the filesystem) such as an HTTP POST, a JMS or MSMQ queue, etc. Furthermore, this can also be synchronous, allowing the process receiving the file to acknowledge the content, even check it for worthiness, and hand the client a receipt - this is the righteous road to non-repudiation. Follow this, and you will never suffer arguments over whether a file was or was not delivered to your server for processing.

M.

风柔一江水 2024-07-17 14:15:04

一种简单的可能性是以相当大的间隔(2 到 5 分钟)进行轮询,并且仅在第二次看到新文件时才确认它。

除了检查文件是否被锁定之外,我不知道任何操作系统中都有什么方法可以确定文件是否仍在复制。

One simple possibility would be to poll at a fairly large interval (2 to 5 minutes) and only acknowledge the new file the second time you see it.

I don't know of a way in any OS to determine whether a file is still being copied, other than maybe checking if the file is locked.

豆芽 2024-07-17 14:15:04

文件如何到达那里? 您可以在写入时为其设置属性,然后在写入完成后更改属性吗? 这需要由写作的人来完成……这听起来好像不是一个选择。

否则,缓存列表并将文件视为新文件(如果两个连续列表的文件大小相同)是我能想到的最佳方法。

或者,您可以使用文件的修改时间 - 文件必须是新的,并且修改时间至少是过去的 x 时间。 但我认为这大约相当于缓存列表。

如果您每隔几秒钟轮询一次文件夹,那么时间损失不会太大,不是吗? 而且它的平台无关。

另外,仅限 Linux:http://www.linux.com/feature/144666

与 cron 类似但对于文件。 不确定它如何处理您的具体问题 - 但可能有用?

How are the files getting there? Can you set an attribute on them as they are written and then change the attribute when write is complete? This would need to be done by the thing doing the writing ... which sounds like it isn't an option.

Otherwise, caching the listing and treating a file as new if it has the same file size for two consecutive listings is the best way I can think of.

Alternatively, you could use the modified time on the file - the file has to be new and have a modified time that is at least x in the past. But I think this will be about equivalent to caching the listing.

It you are polling the folder every few seconds, its not much of a time penalty is it? And its platform agnostic.

Also, linux only: http://www.linux.com/feature/144666

Like cron but for files. Not sure how it deals with your specific problem - but may be of use?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文