如何在 C++ 中解析 tar 文件
我想要做的是下载一个包含多个目录的 .tar 文件,每个目录有 2 个文件。问题是我无法找到一种方法来读取 tar 文件而不实际提取文件(使用 tar
)。
完美的解决方案是这样的:
#include <easytar>
Tarfile tar("somefile.tar");
std::string currentFile, currentFileName;
for(int i=0; i<tar.size(); i++){
file = tar.getFileText(i);
currentFileName = tar.getFileName(i);
// do stuff with it
}
我可能必须自己写这个,但任何想法将不胜感激。
What I want to do is download a .tar file with multiple directories with 2 files each. The problem is I can't find a way to read the tar file without actually extracting the files (using tar
).
The perfect solution would be something like:
#include <easytar>
Tarfile tar("somefile.tar");
std::string currentFile, currentFileName;
for(int i=0; i<tar.size(); i++){
file = tar.getFileText(i);
currentFileName = tar.getFileName(i);
// do stuff with it
}
I'm probably going to have to write this myself, but any ideas would be appreciated..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
经过一番工作后,我自己发现了这一点。 tar 文件规范 实际上告诉您需要了解的所有内容。
首先,每个文件都以 512 字节标头开头,因此您可以使用指向较大 char 数组中某处的 char[512] 或 char* 来表示它(例如,如果您将整个文件加载到一个数组中)。
标头如下所示:
因此,如果您想要文件名,可以使用 string filename(buffer[0], 100); 在这里获取它。文件名是空填充的,因此您可以进行检查以确保至少有一个空值,然后如果您想节省空间,则可以忽略大小。
现在我们想知道它是文件还是文件夹。 “链接指示符”字段具有此信息,因此:
此时,我们已经拥有了所需的有关目录的所有信息,但我们还需要普通文件中的一件事:实际的文件内容。
文件的长度可以用两种不同的方式存储,要么是作为 0 或空格填充的空终止八进制字符串,要么是“通过设置最左边字节的高位来指示的 Base-256 编码”数字字段”。
以下是读取八进制格式的方法,但我还没有为 base-256 版本编写代码:
好的,现在我们拥有了除实际文件内容之外的所有内容。我们所要做的就是从 tar 文件中获取下一个
size
字节的数据,我们将获得文件内容:I figured this out myself after a bit of work. The tar file spec actually tells you everything you need to know.
First off, every file starts with a 512 byte header, so you can represent it with a char[512] or a char* pointing at somewhere in your larger char array (if you have the entire file loaded into one array for example).
The header looks like this:
So if you want the file name, you grab it right here with
string filename(buffer[0], 100);
. The file name is null padded, so you could do a check to make sure there's at least one null and then leave off the size if you want to save space.Now we want to know if it's a file or a folder. The "link indicator" field has this information, so:
At this point, we already have all of the information we need about directories, but we need one more thing from normal files: the actual file contents.
The length of the file can be stored in two different ways, either as a 0-or-space-padded null-terminated octal string, or "a base-256 coding that is indicated by setting the high-order bit of the leftmost byte of a numeric field".
Here's how you would read the octal format, but I haven't written code for the base-256 version:
Ok, so now we have everything except the actual file contents. All we have to do is grab the next
size
bytes of data from the tar file and we'll have our file contents:您看过 libtar 吗?
来自 fink 包信息:
本身不是 C++,但你可以很容易地链接到 C...
Have you looked at libtar?
From the fink package info:
Not c++ per se, but you can link to c pretty easily...
libarchive 可以是解析 tarball 的开源库。 Libarchive可以从归档文件中读取每个文件而无需解压,也可以写入数据以形成新的归档文件。
libarchive can be the open source library to parse the tarball. Libarchive can read each files from an archive file without extraction, and also it can write data to form a new archive file.