如何在 Linux 中从 HTTP MIME 编码消息中提取文件数据?
我有一个程序接受HTTP POST 文件并将所有POST 结果写入一个文件,我想编写一个脚本来删除HTTP 标头,只保留二进制文件数据,该怎么做?
文件内容如下(Content-Type: application/octet-stream
和------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3
之间的数据是我想要的:
POST /?user_name=vvvvvvvv&size=837&file_name=logo.gif& HTTP/1.1^M
Accept: text/*^M
Content-Type: multipart/form-data; boundary=----------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
User-Agent: Shockwave Flash^M
Host: 192.168.0.198:9998^M
Content-Length: 1251^M
Connection: Keep-Alive^M
Cache-Control: no-cache^M
Cookie: cb_fullname=ddddddd; cb_user_name=cdc^M
^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filename"^M
^M
logo.gif^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filedata"; filename="logo.gif"^M
Content-Type: application/octet-stream^M
^M
GIF89an^@I^^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Upload"^M
^M
Submit Query^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3-
I have a program that accepts HTTP post of files and write all the POST result into a file, I want to write a script to delete the HTTP headers, only leave the binary file data, how to do it?
The file content is below (the data between Content-Type: application/octet-stream
and ------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3
is what I want:
POST /?user_name=vvvvvvvv&size=837&file_name=logo.gif& HTTP/1.1^M
Accept: text/*^M
Content-Type: multipart/form-data; boundary=----------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
User-Agent: Shockwave Flash^M
Host: 192.168.0.198:9998^M
Content-Length: 1251^M
Connection: Keep-Alive^M
Cache-Control: no-cache^M
Cookie: cb_fullname=ddddddd; cb_user_name=cdc^M
^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filename"^M
^M
logo.gif^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filedata"; filename="logo.gif"^M
Content-Type: application/octet-stream^M
^M
GIF89an^@I^^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Upload"^M
^M
Submit Query^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3-
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您想在文件结束时执行此操作,还是在文件结束后执行此操作?
几乎任何脚本语言都应该可以工作。我的 AWK 有点生疏,但是...
应该打印
application/octet-stream
和----------
行之间的所有内容。它还可能包含这两行,这意味着您必须做一些更复杂的事情:application\/octet-stream
行位于 print 语句之后,因为您想要设置当您看到
更改为application/octet-stream
后,将 state1
。当然,作为 Unix,您可以通过 awk 通过管道输出程序的输出,然后保存文件。
You want to do this as the file is going over, or is this something you want to do after the file comes over?
Almost any scripting language should work. My AWK is a bit rusty, but...
That should print everything between
application/octet-stream
and the----------
lines. It might also include both those lines too which means you'll have to do something a bit more complex:The
application\/octet-stream
line is after the print statement because you want to setstate
to1
after you seeapplication/octet-stream
.Of course, being Unix, you could pipe the output of your program through awk and then save the file.
如果您使用 Python,
email.parser.Parser
将允许您解析多部分 MIME 文档。
If you use Python,
email.parser.Parser
will allow you to parse a multipart MIME document.这可能是一个疯狂的想法,但我会尝试用 procmail 剥离标头。
This may be a crazy idea, but I would try stripping the headers with procmail.
查看 Perl 的 Mime::Tools 套件。拥有丰富的课程;我确信您可以用几行代码将一些内容组合在一起。
Look at the Mime::Tools suite for Perl. It has a rich set of classes; I’m sure you could put something together in just a few lines.
这可能包含一些拼写错误或其他内容,但无论如何请耐心等待。首先确定边界(如果需要,
input
是包含数据 - 管道的文件):然后过滤
Filedata
部分:最后一部分是过滤一些额外的行 - 标题行在空行和边界本身之前并包括空行和边界本身,因此将最后一行从上一行更改为:
之间的行Filedata
标头和边界(包括),sed '1,/^$/d'
正在删除第一行(包括第一行)之前的所有内容(因此删除标头)和sed '$d'
删除最后一行(边界)。之后,您等待丹尼斯(请参阅评论)对其进行优化,然后您会得到以下信息:
既然您已经来到这里,请从头开始并执行伊格纳西奥建议的操作。原因 - 这可能无法(可靠地)工作,因为 GIF 是二进制数据。
啊,这是一次很好的锻炼!不管怎样,对于
sed
的爱好者来说,这是一个很棒的页面:重要信息。
This probably contains some typos or something, but bear with me anyway. First determine the boundary (
input
is the file containing the data - pipe if necessary):Then filter the
Filedata
part:The last part is filter a few extra lines - header lines before and including the empty line and the boundary itself, so change the last line from previous to:
sed -n "/$fd/,/$boundary/p"
filters the lines between theFiledata
header and the boundary (inclusive),sed '1,/^$/d'
is deleting everything up to and including the first line (so removes the headers) andsed '$d'
removes the last line (the boundary).After this, you wait for Dennis (see comments) to optimize it and you get this:
Now that you've come here, scratch all this and do what Ignacio suggested. Reason - this probably won't work (reliably) for this, as GIF is binary data.
Ah, it was a good exercise! Anyway, for the lovers of
sed
, here's the excellent page:Outstanding information.