阻止 ftplib 下载正在进行的文件?
我们有一个 ftp 系统设置来监控/下载不受我们控制的远程 ftp 服务器。 该脚本连接到远程 ftp,并获取服务器上文件的文件名,然后我们检查是否已经下载了文件。 如果尚未下载,我们将下载该文件并将其添加到列表中。
我们最近遇到一个问题,远程 ftp 端的某个人将复制一个巨大的单个文件(>1GB),然后脚本将醒来看到一个新文件并开始下载正在复制的文件。
什么是检查这个的最好方法是什么? 我正在考虑获取文件大小,等待几秒钟,再次检查文件大小,看看它是否增加了,如果没有增加,我们就下载它。 但由于时间问题,我们不能为每个文件集等待几秒钟,然后查看它的文件大小是否增加。
最好的方法是什么,目前一切都是通过 pythons ftplib 完成的,除了使用上述方法之外,我们还能如何做到这一点。
让我再次重申这一点,我们对远程 ftp 站点的控制权为零。
谢谢。
UPDATE1:
我在想如果我尝试重命名它怎么办...因为我们对 ftp 拥有完全权限,如果文件上传正在进行中,重命名命令会失败吗?
我们这里没有任何真正的选择......是吗?
更新2: 这里有一些有趣的事情,我们测试的一些 ftp 似乎会在传输开始后自动分配空间。
例如,如果我将 200mb 文件传输到 ftp 服务器。 如果我连接到 ftp 服务器并在上传时执行大小操作,则传输处于活动状态。 它显示大小为 200mb。 尽管该文件只完成了 10%。
权限似乎也是随机设置的,IIS 附带的 FTP 服务器在文件复制完成后设置权限。 而其他一些较旧的 ftp 服务器会在您发送文件后立即设置它。
:'(
We have a ftp system setup to monitor/download from remote ftp servers that are not under our control. The script connects to the remote ftp, and grabs the file names of files on the server, we then check to see if its something that has already been downloaded. If it hasn't been downloaded then we download the file and add it to the list.
We recently ran into an issue, where someone on the remote ftp side, will copy in a massive single file(>1GB) then the script will wake up see a new file and begin downloading the file that is being copied in.
What is the best way to check this? I was thinking of grabbing the file size waiting a few seconds checking the file size again and see if it has increased, if it hasn't then we download it. But since time is of the concern, we can't wait a few seconds for every single file set and see if it's file size has increased.
What would be the best way to go about this, currently everything is done via pythons ftplib, how can we do this aside from using the aforementioned method.
Yet again let me reiterate this, we have 0 control over the remote ftp sites.
Thanks.
UPDATE1:
I was thinking what if i tried to rename it... since we have full permissions on the ftp, if the file upload is in progress would the rename command fail?
We don't have any real options here... do we?
UPDATE2:
Well here's something interesting some of the ftps we tested on appear to automatically allocate the space once the transfer starts.
E.g. If i transfer a 200mb file to the ftp server. While the transfer is active if i connect to the ftp server and do a size while the upload is happening. It shows 200mb for the size. Even though the file is only like 10% complete.
Permissions also seem to be randomly set the FTP Server that comes with IIS sets the permissions AFTER the file is finished copying. While some of the other older ftp servers set it as soon as you send the file.
:'(
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
“该死的鱼雷! 全速前进!”
只需下载文件即可。 如果它是一个大文件,则下载完成后,请根据您的情况合理等待,然后从停止点继续下载。 重复此操作,直到没有更多内容可供下载。
“Damn the torpedoes! Full speed ahead!”
Just download the file. If it is a large file then after the download completes wait as long as is reasonable for your scenario and continue the download from the point it stopped. Repeat until there is no more stuff to download.
正如你所说,你对服务器的控制权为 0,并且无法让你的客户端按照 S. Lott 的建议发布触发文件,你必须处理不完美的解决方案并冒着不完整文件传输的风险,也许是等待一段时间并比较文件大小之前和之后。
您可以尝试按照您的建议重命名,但由于您的控制权为 0,您无法确定 ftp 服务器管理员(或其继任者)不会更改平台或 ftp 服务器或限制您的权限。
对不起。
As you say you have 0 control over the servers and can't make your clients post trigger files as suggested by S. Lott, you must deal with the imperfect solution and risk incomplete file transmission, perhaps by waiting for a while and compare file sizes before and after.
You can try to rename as you suggested, but as you have 0 control you can't be sure that the ftp-server-administrator (or their successor) doesn't change platforms or ftp servers or restricts your permissions.
Sorry.
如果您正在处理多个文件,您可以立即获取所有大小的列表,等待十秒钟,然后查看哪些是相同的。 无论哪个仍然相同,都应该可以安全下载。
If you are dealing with multiple files, you could get the list of all the sizes at once, wait ten seconds, and see which are the same. Whichever are still the same should be safe to download.
您无法知道操作系统复制何时完成。 它可能会放慢速度或等待。
为了绝对确定,您确实需要两个文件。
他们可以随心所欲地处理大量文件。 但是当他们触及触发文件时,您就会同时下载这两个文件。
如果无法获得触发器,则必须平衡轮询所需的时间与下载所需的时间。
做这个。
获取列表。 检查时间戳。
检查文件大小与之前的大小。 如果大小甚至不接近,它现在就会被复制。 等待; 循环执行此步骤,直到大小接近之前的大小。
当你还没有完成时:
a. 获取文件。
b. 再次获取列表。 检查新列表、先前列表和文件的大小。 如果他们同意:你就完成了。 如果他们不同意:文件在您下载时已更改; 你还没有完成。
You can't know when the OS copy is done. It could slow down or wait.
For absolute certainty, you really need two files.
They can mess with the massive file all they want. But when they touch the trigger file, you're downloading both.
If you can't get a trigger, you have to balance the time required to poll vs. the time required to download.
Do this.
Get a listing. Check timestamps.
Check sizes vs. previous size of file. If size isn't even close, it's being copied right now. Wait; loop on this step until size is close to previous size.
While you're not done:
a. Get the file.
b. Get a listing AGAIN. Check the size of the new listing, previous listing and your file. If they agree: you're done. If they don't agree: file changed while you were downloading; you're not done.