从远程服务器上的文件读取头数据

发布于 2024-07-14 18:58:12 字数 552 浏览 8 评论 0 原文

我现在正在开发一个项目,需要从远程服务器上的文件读取标头数据。 我正在谈论许多大文件,因此我无法读取整个文件,而只能读取我需要的标头数据。

我唯一的解决方案是使用熔断器安装远程服务器,然后从文件中读取标头,就像它们在我的本地计算机上一样。 我已经尝试过并且有效。 但它有一些缺点。 特别是 FTP:

  • 非常慢(FTP 与带有curlftpfs 的 SSH 进行比较)。 从同一服务器,使用 SSH 90 文件在 18 秒内读取。 通过 FTP 在 39 秒内传输 10 个文件。
  • 不可靠。 有时挂载点不会被卸载。
  • 如果服务器处于主动状态并且已完成被动安装。 该挂载点和父文件夹将在大约 3 分钟内被锁定。
  • 即使有数据传输,也会超时(猜测这是 FTP 协议而不是curlftpfs)。

Fuse是一个解决方案,但我不太喜欢它,因为我觉得我不能信任它。 所以我的问题基本上是是否还有其他解决方案。 语言最好是 Ruby,但如果 R​​uby 不支持该解决方案,任何其他语言都可以。

谢谢!

I'm working on a project right now where I need to read header data from files on remote servers. I'm talking about many and large files so I cant read whole files, but just the header data I need.

The only solution I have is to mount the remote server with fuse and then read the header from the files as if they where on my local computer. I've tried it and it works. But it has some drawbacks. Specially with FTP:

  • Really slow (FTP is compared to SSH with curlftpfs). From same server, with SSH 90 files was read in 18 seconds. And with FTP 10 files in 39 seconds.
  • Not dependable. Sometimes the mountpoint will not be unmounted.
  • If the server is active and a passive mounting is done. That mountpoint and the parent folder gets locked in about 3 minutes.
  • Does timeout, even when there's data transfer going (guess this is the FTP-protocol and not curlftpfs).

Fuse is a solution, but I don't like it very much because I don't feel that I can trust it. So my question is basically if there's any other solutions to the problem. Language is preferably Ruby, but any other will work if Ruby does not support the solution.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

瑶笙 2024-07-21 18:58:13

您正在寻找什么类型的信息?

您可以尝试使用 ruby​​ 的 open-uri 模块。
以下示例来自 http://www.ruby -doc.org/stdlib/libdoc/open-uri/rdoc/index.html

require 'open-uri'
open("http://www.ruby-lang.org/en") {|f|
  p f.base_uri         # <URI::HTTP:0x40e6ef2 URL:http://www.ruby-lang.org/en/>
  p f.content_type     # "text/html"
  p f.charset          # "iso-8859-1"
  p f.content_encoding # []
  p f.last_modified    # Thu Dec 05 02:45:02 UTC 2002
}

编辑:看起来op想要从远程文件中检索ID3标签信息。 这个比较复杂。

来自维基:
这似乎是一个难题。

在维基上:

标记文件内的位置

只有ID3v2.4标准才有
可以将标签数据放置在
文件末尾,与
ID3v1。 ID3v2.2 和 2.3 要求
标签数据位于文件之前。 同时
对于流数据来说这绝对是
必需的,对于静态数据这意味着
整个音频文件必须是
更新为在前面插入数据
文件。 对于初始标记这个
每个文件都会受到很大的处罚
必须重写。 标签编写者是
鼓励在之后引入填充
标签数据以便允许
无需编辑标签数据
要求整个音频文件
重写了,但这些不是标准的
并且标签要求可能会有所不同
很大,特别是如果 APIC
(相关图片)还有
嵌入。

这意味着根据文件的 ID3 标记版本,您可能必须读取文件的不同部分。

这里有一篇文章概述了使用 ruby​​ for ID3tagv1.1 读取 ID3 标签的基础知识,但应该作为一个很好的起点: http://rubyquiz.com/quiz136.html

您还可以考虑使用 ID3 解析库,例如 id3.rbid3lib-ruby; 但是,我不确定是否支持解析远程文件的能力(很可能可以通过一些修改)。

What type of information are you looking for?

You could try using ruby's open-uri module.
The following example is from http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/index.html

require 'open-uri'
open("http://www.ruby-lang.org/en") {|f|
  p f.base_uri         # <URI::HTTP:0x40e6ef2 URL:http://www.ruby-lang.org/en/>
  p f.content_type     # "text/html"
  p f.charset          # "iso-8859-1"
  p f.content_encoding # []
  p f.last_modified    # Thu Dec 05 02:45:02 UTC 2002
}

EDIT: It seems that the op wanted to retrieve ID3 tag information from the remote files. This is more complex.

From wiki:
This appears to be a difficult problem.

On wiki:

Tag location within file

Only with the ID3v2.4 standard has it
been possible to place the tag data at
the end of the file, in common with
ID3v1. ID3v2.2 and 2.3 require that
the tag data precede the file. Whilst
for streaming data this is absolutely
required, for static data it means
that the entire audio file must be
updated to insert data at the front of
the file. For initial tagging this
incurs a large penalty as every file
must be re-written. Tag writers are
encouraged to introduce padding after
the tag data in order to allow for
edits to the tag data without
requiring the entire audio file to be
re-written, but these are not standard
and the tag requirements may vary
greatly, especially if APIC
(associated pictures) are also
embedded.

This means that depending on the ID3 tag version of the file, you may have to read different parts of the file.

Here's an article that outlines the basics of reading ID3 tag using ruby for ID3tagv1.1 but should server as a good starting point: http://rubyquiz.com/quiz136.html

You could also look into using a ID3 parsing library, such as id3.rb or id3lib-ruby; however, I'm not sure if either supports the ability to parse a remote file (Most likely could through some modifications).

陌上芳菲 2024-07-21 18:58:13

一个“最好的解决方案”是开始传输,并在下载的文件超过字节时停止传输。 由于没有多少(如果有的话)库允许中断连接,因此它更加复杂,并且可能需要您手动编写一个特定的 ftp 客户端,其中有两个线程,一个执行 FTP 连接和传输,另一个监视大小下载的文件并杀死第一个线程。

或者,至少,您可以并行化文件传输。 这样您就不必等待所有文件完全传输后再分析文件的开头。 然后转移将继续

A "best-as-nothing" solution would be to start the transfer, and stop it when dowloaded file has more than bytes. Since not many (if any) libraries will allow interruption of the connection, it is more complex and will probably require you to manually code a specific ftp client, with two threads, one doing the FTP connection and transfer, and the other monitoring the size of the downloaded file and killing the first thread.

Or, at least, you could parallelize the file transfers. So that you don't wait for all the files being fully transferred to analyze the start of the file. The transfer will then continue

不甘平庸 2024-07-21 18:58:13

有人提议 < code>RANG 命令,允许仅检索文件的一部分(此处为第一个字节)。

然而,我没有找到任何包含该提案或实施的参考。

因此,对于特定服务器,测试(或检查 FTP 服务器的文档)可能很有用 - 并使用它(如果可用)。

There has been a proposal of a RANG command, allowing to retrieve only a part of the files (here, the first bytes).

I didn't find any reference of inclusion of this proposal, nor implementation, however.

So, for a specific server it could be useful to test (or check the docs of the FTP server) - and use it if available.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文