如何在 Linux 中从 HTTP MIME 编码消息中提取文件数据？

发布于 2024-10-03 17:49:40 字数 1042 浏览 2 评论 0原文

我有一个程序接受HTTP POST 文件并将所有POST 结果写入一个文件，我想编写一个脚本来删除HTTP 标头，只保留二进制文件数据，该怎么做？

文件内容如下（Content-Type: application/octet-stream和------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3之间的数据是我想要的:

POST /?user_name=vvvvvvvv&size=837&file_name=logo.gif& HTTP/1.1^M
Accept: text/*^M
Content-Type: multipart/form-data; boundary=----------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
User-Agent: Shockwave Flash^M
Host: 192.168.0.198:9998^M
Content-Length: 1251^M
Connection: Keep-Alive^M
Cache-Control: no-cache^M
Cookie: cb_fullname=ddddddd; cb_user_name=cdc^M
^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filename"^M
^M
logo.gif^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filedata"; filename="logo.gif"^M
Content-Type: application/octet-stream^M
^M
GIF89an^@I^^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Upload"^M
^M
Submit Query^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3-

原文

I have a program that accepts HTTP post of files and write all the POST result into a file, I want to write a script to delete the HTTP headers, only leave the binary file data, how to do it?

The file content is below (the data between Content-Type: application/octet-stream and ------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3 is what I want:

POST /?user_name=vvvvvvvv&size=837&file_name=logo.gif& HTTP/1.1^M
Accept: text/*^M
Content-Type: multipart/form-data; boundary=----------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
User-Agent: Shockwave Flash^M
Host: 192.168.0.198:9998^M
Content-Length: 1251^M
Connection: Keep-Alive^M
Cache-Control: no-cache^M
Cookie: cb_fullname=ddddddd; cb_user_name=cdc^M
^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filename"^M
^M
logo.gif^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filedata"; filename="logo.gif"^M
Content-Type: application/octet-stream^M
^M
GIF89an^@I^^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Upload"^M
^M
Submit Query^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3-

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

维持三分热 2024-10-10 17:49:40

您想在文件结束时执行此操作，还是在文件结束后执行此操作？

几乎任何脚本语言都应该可以工作。我的 AWK 有点生疏，但是...

awk '/^Content-Type: application\/octet-stream/,/^--------/'

应该打印 application/octet-stream 和 ---------- 行之间的所有内容。它还可能包含这两行，这意味着您必须做一些更复杂的事情：

BEGIN {state = 0}
{
    if ($0 ~ /^------------/) {
        state = 0;
    }
    if (state == 1) {
        print $0
    }
    if ($0 ~ /^Content-Type: application\/octet-stream/) {
        state = 1;
    }
}

application\/octet-stream 行位于 print 语句之后，因为您想要设置 当您看到 application/octet-stream 后，将 state 更改为 1。

当然，作为 Unix，您可以通过 awk 通过管道输出程序的输出，然后保存文件。

You want to do this as the file is going over, or is this something you want to do after the file comes over?

Almost any scripting language should work. My AWK is a bit rusty, but...

awk '/^Content-Type: application\/octet-stream/,/^--------/'

That should print everything between application/octet-stream and the ---------- lines. It might also include both those lines too which means you'll have to do something a bit more complex:

BEGIN {state = 0}
{
    if ($0 ~ /^------------/) {
        state = 0;
    }
    if (state == 1) {
        print $0
    }
    if ($0 ~ /^Content-Type: application\/octet-stream/) {
        state = 1;
    }
}

The application\/octet-stream line is after the print statement because you want to set state to 1 after you see application/octet-stream.

Of course, being Unix, you could pipe the output of your program through awk and then save the file.

回复收藏 0 原文

甜嗑 2024-10-10 17:49:40

如果您使用 Python，email.parser.Parser将允许您解析多部分 MIME 文档。

回复收藏 0 原文

请叫√我孤独 2024-10-10 17:49:40

这可能是一个疯狂的想法，但我会尝试用 procmail 剥离标头。

回复收藏 0 原文

风月客 2024-10-10 17:49:40

查看 Perl 的 Mime::Tools 套件。拥有丰富的课程；我确信您可以用几行代码将一些内容组合在一起。

回复收藏 0 原文

再可℃爱ぅ一点好了 2024-10-10 17:49:40

这可能包含一些拼写错误或其他内容，但无论如何请耐心等待。首先确定边界（如果需要，input 是包含数据 - 管道的文件）：

boundary=`grep '^Content-Type: multipart/form-data; boundary=' input|sed 's/.*boundary=//'`

然后过滤 Filedata 部分：

fd='Content-Disposition: form-data; name="Filedata"'
sed -n "/$fd/,/$boundary/p"

最后一部分是过滤一些额外的行 - 标题行在空行和边界本身之前并包括空行和边界本身，因此将最后一行从上一行更改为：

sed -n "/$fd/,/$boundary/p" | sed '1,/^$/d' | sed '$d'

sed -n "/$fd/,/$boundary/p" 过滤 之间的行Filedata 标头和边界（包括），
sed '1,/^$/d' 正在删除第一行（包括第一行）之前的所有内容（因此删除标头）和
sed '$d' 删除最后一行（边界）。

之后，您等待丹尼斯（请参阅评论）对其进行优化，然后您会得到以下信息：

sed "1,/$fd/d;/^$/d;/$boundary/,$d"

既然您已经来到这里，请从头开始并执行伊格纳西奥建议的操作。原因 - 这可能无法（可靠地）工作，因为 GIF 是二进制数据。

啊，这是一次很好的锻炼！不管怎样，对于 sed 的爱好者来说，这是一个很棒的页面：

http://sed .sourceforge.net/sed1line.txt

重要信息。

This probably contains some typos or something, but bear with me anyway. First determine the boundary (input is the file containing the data - pipe if necessary):

boundary=`grep '^Content-Type: multipart/form-data; boundary=' input|sed 's/.*boundary=//'`

Then filter the Filedata part:

fd='Content-Disposition: form-data; name="Filedata"'
sed -n "/$fd/,/$boundary/p"

The last part is filter a few extra lines - header lines before and including the empty line and the boundary itself, so change the last line from previous to:

sed -n "/$fd/,/$boundary/p" | sed '1,/^$/d' | sed '$d'

sed -n "/$fd/,/$boundary/p" filters the lines between the Filedata header and the boundary (inclusive),
sed '1,/^$/d' is deleting everything up to and including the first line (so removes the headers) and
sed '$d' removes the last line (the boundary).

After this, you wait for Dennis (see comments) to optimize it and you get this:

sed "1,/$fd/d;/^$/d;/$boundary/,$d"

Now that you've come here, scratch all this and do what Ignacio suggested. Reason - this probably won't work (reliably) for this, as GIF is binary data.

Ah, it was a good exercise! Anyway, for the lovers of sed, here's the excellent page: