我是否正确解析此 HTTP POST 请求？

发布于 2024-09-10 08:17:47 字数 2163 浏览 4 评论 0 原文

首先我要说的是，我正在使用 twisted.web 框架。 Twisted.web 的文件上传没有像我想要的那样工作（它只包含文件数据，没有任何其他信息），cgi.parse_multipart 没有不能像我想要的那样工作（同样的事情，twisted.web使用这个函数），cgi.FieldStorage不起作用（因为我正在获取POST数据通过twisted，而不是CGI接口——据我所知，FieldStorage尝试通过stdin获取请求，而twisted.web2对我不起作用因为使用 Deferred 让我感到困惑和愤怒（对于我想要的来说太复杂了）。

话虽这么说，我决定尝试自己解析 HTTP 请求。

使用 Chrome，HTTP 请求的形成方式如下：

------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="upload_file_nonce"

11b03b61-9252-11df-a357-00266c608adb
------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="file"; filename="login.html"
Content-Type: text/html

<!DOCTYPE html>
<html>
  <head> 

...

------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="file"; filename=""


------WebKitFormBoundary7fouZ8mEjlCe92pq--

总是这样形成吗？我用正则表达式解析它，就像这样（请原谅代码墙）：（

注意，我剪掉了大部分代码以仅显示我认为相关的内容（正则表达式（是的，嵌套括号），这是我构建的 Uploads 类中的 __init__ 方法（迄今为止唯一的方法）完整的代码可以在修订历史记录中看到（我希望我没有不匹配任何代码）。括号）

if line == "--{0}--".format(boundary):
    finished = True

if in_header == True and not line:
    in_header = False
    if 'type' not in current_file:
        ignore_current_file = True

if in_header == True:
    m = re.match(
        "Content-Disposition: form-data; name=\"(.*?)\"; filename=\"(.*?)\"$", line)
    if m:
        input_name, current_file['filename'] = m.group(1), m.group(2)

    m = re.match("Content-Type: (.*)$", line)
    if m:
        current_file['type'] = m.group(1)

    else:
        if 'data' not in current_file:
            current_file['data'] = line
        else:
            current_file['data'] += line

你可以看到，每当到达边界时，我都会开始一个新的“文件”字典，我将 in_header 设置为 True 来表示我正在解析标头。到达空行时，我将其切换为 False - 但在检查是否为该表单值设置了 Content-Type 之前 - 如果没有，我设置 ignore_current_file 因为我只是在寻找文件上传，

我知道我应该使用一个库，但我厌倦了阅读文档，试图在我的项目中找到不同的解决方案，但仍然有。代码看起来很合理。我只是想跳过这一部分——如果解析带有文件上传的 HTTP POST 是如此简单，那么我将坚持下去。

注意：此代码目前工作正常，我只是想知道它是否会阻止/吐出来自某些浏览器的请求。

原文

Let me start off by saying, I'm using the twisted.web framework. Twisted.web's file uploading didn't work like I wanted it to (it only included the file data, and not any other information), cgi.parse_multipart doesn't work like I want it to (same thing, twisted.web uses this function), cgi.FieldStorage didn't work ('cause I'm getting the POST data through twisted, not a CGI interface -- so far as I can tell, FieldStorage tries to get the request via stdin), and twisted.web2 didn't work for me because the use of Deferred confused and infuriated me (too complicated for what I want).

That being said, I decided to try and just parse the HTTP request myself.

Using Chrome, the HTTP request is formed like this:

------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="upload_file_nonce"

11b03b61-9252-11df-a357-00266c608adb
------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="file"; filename="login.html"
Content-Type: text/html

<!DOCTYPE html>
<html>
  <head> 

...

------WebKitFormBoundary7fouZ8mEjlCe92pq
Content-Disposition: form-data; name="file"; filename=""


------WebKitFormBoundary7fouZ8mEjlCe92pq--

Is this always how it will be formed? I'm parsing it with regular expressions, like so (pardon the wall of code):

(note, I snipped out most of the code to show only what I thought was relevant (the regular expressions (yeah, nested parentheses), this is an __init__ method (the only method so far) in an Uploads class I built. The full code can be seen in the revision history (I hope I didn't mismatch any parentheses)

if line == "--{0}--".format(boundary):
    finished = True

if in_header == True and not line:
    in_header = False
    if 'type' not in current_file:
        ignore_current_file = True

if in_header == True:
    m = re.match(
        "Content-Disposition: form-data; name=\"(.*?)\"; filename=\"(.*?)\"$", line)
    if m:
        input_name, current_file['filename'] = m.group(1), m.group(2)

    m = re.match("Content-Type: (.*)$", line)
    if m:
        current_file['type'] = m.group(1)

    else:
        if 'data' not in current_file:
            current_file['data'] = line
        else:
            current_file['data'] += line

you can see that I start a new "file" dict whenever a boundary is reached. I set in_header to True to say that I'm parsing headers. When I reach a blank line, I switch it to False -- but not before checking if a Content-Type was set for that form value -- if not, I set ignore_current_file since I'm only looking for file uploads.

I know I should be using a library, but I'm sick to death of reading documentation, trying to get different solutions to work in my project, and still having the code look reasonable. I just want to get past this part -- and if parsing an HTTP POST with file uploads is this simple, then I shall stick with that.

Note: this code works perfectly for now, I'm just wondering if it will choke on/spit out requests from certain browsers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清音悠歌 2024-09-17 08:17:47

我对这个问题的解决方案是使用 cgi.FieldStorage 解析内容，如下所示：

class Root(Resource):

def render_POST(self, request):

    self.headers = request.getAllHeaders()
    # For the parsing part look at [PyMOTW by Doug Hellmann][1]
    img = cgi.FieldStorage(
        fp = request.content,
        headers = self.headers,
        environ = {'REQUEST_METHOD':'POST',
                 'CONTENT_TYPE': self.headers['content-type'],
                 }
    )

    print img["upl_file"].name, img["upl_file"].filename,
    print img["upl_file"].type, img["upl_file"].type
    out = open(img["upl_file"].filename, 'wb')
    out.write(img["upl_file"].value)
    out.close()
    request.redirect('/tests')
    return ''

My solution to this Problem was parsing the content with cgi.FieldStorage like:

class Root(Resource):

def render_POST(self, request):

    self.headers = request.getAllHeaders()
    # For the parsing part look at [PyMOTW by Doug Hellmann][1]
    img = cgi.FieldStorage(
        fp = request.content,
        headers = self.headers,
        environ = {'REQUEST_METHOD':'POST',
                 'CONTENT_TYPE': self.headers['content-type'],
                 }
    )

    print img["upl_file"].name, img["upl_file"].filename,
    print img["upl_file"].type, img["upl_file"].type
    out = open(img["upl_file"].filename, 'wb')
    out.write(img["upl_file"].value)
    out.close()
    request.redirect('/tests')
    return ''

回复收藏 0 原文