我现在正在开发一个项目,需要从远程服务器上的文件读取标头数据。 我正在谈论许多大文件,因此我无法读取整个文件,而只能读取我需要的标头数据。
我唯一的解决方案是使用熔断器安装远程服务器,然后从文件中读取标头,就像它们在我的本地计算机上一样。 我已经尝试过并且有效。 但它有一些缺点。 特别是 FTP:
- 非常慢(FTP 与带有curlftpfs 的 SSH 进行比较)。 从同一服务器,使用 SSH 90 文件在 18 秒内读取。 通过 FTP 在 39 秒内传输 10 个文件。
- 不可靠。 有时挂载点不会被卸载。
- 如果服务器处于主动状态并且已完成被动安装。 该挂载点和父文件夹将在大约 3 分钟内被锁定。
- 即使有数据传输,也会超时(猜测这是 FTP 协议而不是curlftpfs)。
Fuse是一个解决方案,但我不太喜欢它,因为我觉得我不能信任它。 所以我的问题基本上是是否还有其他解决方案。 语言最好是 Ruby,但如果 Ruby 不支持该解决方案,任何其他语言都可以。
谢谢!
I'm working on a project right now where I need to read header data from files on remote servers. I'm talking about many and large files so I cant read whole files, but just the header data I need.
The only solution I have is to mount the remote server with fuse and then read the header from the files as if they where on my local computer. I've tried it and it works. But it has some drawbacks. Specially with FTP:
- Really slow (FTP is compared to SSH with curlftpfs). From same server, with SSH 90 files was read in 18 seconds. And with FTP 10 files in 39 seconds.
- Not dependable. Sometimes the mountpoint will not be unmounted.
- If the server is active and a passive mounting is done. That mountpoint and the parent folder gets locked in about 3 minutes.
- Does timeout, even when there's data transfer going (guess this is the FTP-protocol and not curlftpfs).
Fuse is a solution, but I don't like it very much because I don't feel that I can trust it. So my question is basically if there's any other solutions to the problem. Language is preferably Ruby, but any other will work if Ruby does not support the solution.
Thanks!
发布评论
评论(3)
您正在寻找什么类型的信息?
您可以尝试使用 ruby 的 open-uri 模块。
以下示例来自 http://www.ruby -doc.org/stdlib/libdoc/open-uri/rdoc/index.html
编辑:看起来op想要从远程文件中检索ID3标签信息。 这个比较复杂。
来自维基:
这似乎是一个难题。
在维基上:
这意味着根据文件的 ID3 标记版本,您可能必须读取文件的不同部分。
这里有一篇文章概述了使用 ruby for ID3tagv1.1 读取 ID3 标签的基础知识,但应该作为一个很好的起点: http://rubyquiz.com/quiz136.html
您还可以考虑使用 ID3 解析库,例如 id3.rb 或 id3lib-ruby; 但是,我不确定是否支持解析远程文件的能力(很可能可以通过一些修改)。
What type of information are you looking for?
You could try using ruby's open-uri module.
The following example is from http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/index.html
EDIT: It seems that the op wanted to retrieve ID3 tag information from the remote files. This is more complex.
From wiki:
This appears to be a difficult problem.
On wiki:
This means that depending on the ID3 tag version of the file, you may have to read different parts of the file.
Here's an article that outlines the basics of reading ID3 tag using ruby for ID3tagv1.1 but should server as a good starting point: http://rubyquiz.com/quiz136.html
You could also look into using a ID3 parsing library, such as id3.rb or id3lib-ruby; however, I'm not sure if either supports the ability to parse a remote file (Most likely could through some modifications).
一个“最好的解决方案”是开始传输,并在下载的文件超过字节时停止传输。 由于没有多少(如果有的话)库允许中断连接,因此它更加复杂,并且可能需要您手动编写一个特定的 ftp 客户端,其中有两个线程,一个执行 FTP 连接和传输,另一个监视大小下载的文件并杀死第一个线程。
或者,至少,您可以并行化文件传输。 这样您就不必等待所有文件完全传输后再分析文件的开头。 然后转移将继续
A "best-as-nothing" solution would be to start the transfer, and stop it when dowloaded file has more than bytes. Since not many (if any) libraries will allow interruption of the connection, it is more complex and will probably require you to manually code a specific ftp client, with two threads, one doing the FTP connection and transfer, and the other monitoring the size of the downloaded file and killing the first thread.
Or, at least, you could parallelize the file transfers. So that you don't wait for all the files being fully transferred to analyze the start of the file. The transfer will then continue
有人提议 < code>RANG 命令,允许仅检索文件的一部分(此处为第一个字节)。
然而,我没有找到任何包含该提案或实施的参考。
因此,对于特定服务器,测试(或检查 FTP 服务器的文档)可能很有用 - 并使用它(如果可用)。
There has been a proposal of a
RANG
command, allowing to retrieve only a part of the files (here, the first bytes).I didn't find any reference of inclusion of this proposal, nor implementation, however.
So, for a specific server it could be useful to test (or check the docs of the FTP server) - and use it if available.