我怎样才能在python cgi中找到上传的文件名

发布于 2024-09-12 16:23:44 字数 1400 浏览 0 评论 0原文

我制作了如下简单的网络服务器。

import BaseHTTPServer, os, cgi
import cgitb; cgitb.enable()

html = """
<html>
<body>
<form action="" method="POST" enctype="multipart/form-data">
File upload: <input type="file" name="upfile">
<input type="submit" value="upload">
</form>
</body>
</html>
"""
class Handler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header("content-type", "text/html;charset=utf-8")
        self.end_headers()
        self.wfile.write(html)

    def do_POST(self):
        ctype, pdict = cgi.parse_header(self.headers.getheader('content-type'))
        if ctype == 'multipart/form-data':
            query = cgi.parse_multipart(self.rfile, pdict)
            upfilecontent = query.get('upfile')
            if upfilecontent:
                # i don't know how to get the file name.. so i named it 'tmp.dat'
                fout = file(os.path.join('tmp', 'tmp.dat'), 'wb')
                fout.write (upfilecontent[0])
                fout.close()
        self.do_GET()

if __name__ == '__main__':
    server = BaseHTTPServer.HTTPServer(("127.0.0.1", 8080), Handler)
    print('web server on 8080..')
    server.serve_forever()

在BaseHTTPRequestHandler的do_Post方法中，我成功获取了上传的文件数据。

但我不知道如何获取上传文件的原始名称。 self.rfile.name 只是一个“套接字” 如何获取上传的文件名？

原文

i made simple web server like below.

import BaseHTTPServer, os, cgi
import cgitb; cgitb.enable()

html = """
<html>
<body>
<form action="" method="POST" enctype="multipart/form-data">
File upload: <input type="file" name="upfile">
<input type="submit" value="upload">
</form>
</body>
</html>
"""
class Handler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header("content-type", "text/html;charset=utf-8")
        self.end_headers()
        self.wfile.write(html)

    def do_POST(self):
        ctype, pdict = cgi.parse_header(self.headers.getheader('content-type'))
        if ctype == 'multipart/form-data':
            query = cgi.parse_multipart(self.rfile, pdict)
            upfilecontent = query.get('upfile')
            if upfilecontent:
                # i don't know how to get the file name.. so i named it 'tmp.dat'
                fout = file(os.path.join('tmp', 'tmp.dat'), 'wb')
                fout.write (upfilecontent[0])
                fout.close()
        self.do_GET()

if __name__ == '__main__':
    server = BaseHTTPServer.HTTPServer(("127.0.0.1", 8080), Handler)
    print('web server on 8080..')
    server.serve_forever()

In the do_Post method of BaseHTTPRequestHandler, i got the uploaded file data successfully.

But i can't figure out how to get the original name of the uploaded file.
self.rfile.name is just a 'socket'
How can i get the uploaded file name?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浮萍、无处依 2024-09-19 16:23:44

您在那里作为起点使用的代码相当损坏（例如，查看 global rootnode ，其中名称 rootnode 被使用无处 - 显然是一半- 编辑过的源代码，而且做得很糟糕）。

无论如何，您在 POST 中使用“客户端”形式是什么？它如何设置 upfile 字段？

为什么不使用正常的 FieldStorage 方法，如 Python 文档？这样，您可以使用相应字段的 .file 属性来读取要读取的类似文件的对象，或其 .value 属性来读取内存中的所有内容，并将其作为字符串获取，加上字段的 .filename 属性即可知道上传的文件的名称。有关 FieldStorage 的更详细但简洁的文档位于此处。

编辑：现在OP已经编辑了Q来澄清，我看到了问题：BaseHTTPServer确实没有根据CGI规范设置环境，因此 cgi 模块对此不太有用。不幸的是，环境设置的唯一简单方法是从 CGIHTTPServer.py 中窃取和破解一大段代码（无意重用，因此需要，叹息，复制和粘贴代码），例如...：

def populenv(self):
        path = self.path
        dir, rest = '.', 'ciao'

        # find an explicit query string, if present.
        i = rest.rfind('?')
        if i >= 0:
            rest, query = rest[:i], rest[i+1:]
        else:
            query = ''

        # dissect the part after the directory name into a script name &
        # a possible additional path, to be stored in PATH_INFO.
        i = rest.find('/')
        if i >= 0:
            script, rest = rest[:i], rest[i:]
        else:
            script, rest = rest, ''

        # Reference: http://hoohoo.ncsa.uiuc.edu/cgi/env.html
        # XXX Much of the following could be prepared ahead of time!
        env = {}
        env['SERVER_SOFTWARE'] = self.version_string()
        env['SERVER_NAME'] = self.server.server_name
        env['GATEWAY_INTERFACE'] = 'CGI/1.1'
        env['SERVER_PROTOCOL'] = self.protocol_version
        env['SERVER_PORT'] = str(self.server.server_port)
        env['REQUEST_METHOD'] = self.command
        uqrest = urllib.unquote(rest)
        env['PATH_INFO'] = uqrest
        env['SCRIPT_NAME'] = 'ciao'
        if query:
            env['QUERY_STRING'] = query
        host = self.address_string()
        if host != self.client_address[0]:
            env['REMOTE_HOST'] = host
        env['REMOTE_ADDR'] = self.client_address[0]
        authorization = self.headers.getheader("authorization")
        if authorization:
            authorization = authorization.split()
            if len(authorization) == 2:
                import base64, binascii
                env['AUTH_TYPE'] = authorization[0]
                if authorization[0].lower() == "basic":
                    try:
                        authorization = base64.decodestring(authorization[1])
                    except binascii.Error:
                        pass
                    else:
                        authorization = authorization.split(':')
                        if len(authorization) == 2:
                            env['REMOTE_USER'] = authorization[0]
        # XXX REMOTE_IDENT
        if self.headers.typeheader is None:
            env['CONTENT_TYPE'] = self.headers.type
        else:
            env['CONTENT_TYPE'] = self.headers.typeheader
        length = self.headers.getheader('content-length')
        if length:
            env['CONTENT_LENGTH'] = length
        referer = self.headers.getheader('referer')
        if referer:
            env['HTTP_REFERER'] = referer
        accept = []
        for line in self.headers.getallmatchingheaders('accept'):
            if line[:1] in "\t\n\r ":
                accept.append(line.strip())
            else:
                accept = accept + line[7:].split(',')
        env['HTTP_ACCEPT'] = ','.join(accept)
        ua = self.headers.getheader('user-agent')
        if ua:
            env['HTTP_USER_AGENT'] = ua
        co = filter(None, self.headers.getheaders('cookie'))
        if co:
            env['HTTP_COOKIE'] = ', '.join(co)
        # XXX Other HTTP_* headers
        # Since we're setting the env in the parent, provide empty
        # values to override previously set values
        for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH',
                  'HTTP_USER_AGENT', 'HTTP_COOKIE', 'HTTP_REFERER'):
            env.setdefault(k, "")
        os.environ.update(env)

这可以进一步大大简化，但必须在该任务上花费一些时间和精力:-(。

有了这个 populenv 函数，我们可以重新编码：

def do_POST(self):
    populen(self)
    form = cgi.FieldStorage(fp=self.rfile)
    upfilecontent = form['upfile'].value
    if upfilecontent:
        fout = open(os.path.join('tmp', form['upfile'].filename), 'wb')
        fout.write(upfilecontent)
        fout.close()
    self.do_GET()

...并实时从此幸福快乐;-)。（当然，使用任何像样的 WSGI 服务器，甚至演示服务器，会容易得多，但是这个练习对 CGI 及其内部结构有指导意义;-)。

Pretty broken code you're using there as a starting point (e.g. look at that global rootnode where name rootnode is used nowhere -- clearly half-edited source, and badly at that).

Anyway, what form are you using "client-side" for the POST? How does it set that upfile field?

Why aren't you using the normal FieldStorage approach, as documented in Python's docs? That way, you could use the .file attribute of the appropriate field to get a file-like object to read, or its .value attribute to read it all in memory and get it as a string, plus the .filename attribute of the field to know the uploaded file's name. More detailed, though concise, docs on FieldStorage, are here.

Edit: now that the OP has edited the Q to clarify, I see the problem: BaseHTTPServer does not set the environment according to the CGI specs, so the cgi module isn't very usable with it. Unfortunately the only simple approach to environment setting is to steal and hack a big piece of code from CGIHTTPServer.py (wasn't intented for reuse, whence the need for, sigh, copy and paste coding), e.g....:

def populenv(self):
        path = self.path
        dir, rest = '.', 'ciao'

        # find an explicit query string, if present.
        i = rest.rfind('?')
        if i >= 0:
            rest, query = rest[:i], rest[i+1:]
        else:
            query = ''

        # dissect the part after the directory name into a script name &
        # a possible additional path, to be stored in PATH_INFO.
        i = rest.find('/')
        if i >= 0:
            script, rest = rest[:i], rest[i:]
        else:
            script, rest = rest, ''

        # Reference: http://hoohoo.ncsa.uiuc.edu/cgi/env.html
        # XXX Much of the following could be prepared ahead of time!
        env = {}
        env['SERVER_SOFTWARE'] = self.version_string()
        env['SERVER_NAME'] = self.server.server_name
        env['GATEWAY_INTERFACE'] = 'CGI/1.1'
        env['SERVER_PROTOCOL'] = self.protocol_version
        env['SERVER_PORT'] = str(self.server.server_port)
        env['REQUEST_METHOD'] = self.command
        uqrest = urllib.unquote(rest)
        env['PATH_INFO'] = uqrest
        env['SCRIPT_NAME'] = 'ciao'
        if query:
            env['QUERY_STRING'] = query
        host = self.address_string()
        if host != self.client_address[0]:
            env['REMOTE_HOST'] = host
        env['REMOTE_ADDR'] = self.client_address[0]
        authorization = self.headers.getheader("authorization")
        if authorization:
            authorization = authorization.split()
            if len(authorization) == 2:
                import base64, binascii
                env['AUTH_TYPE'] = authorization[0]
                if authorization[0].lower() == "basic":
                    try:
                        authorization = base64.decodestring(authorization[1])
                    except binascii.Error:
                        pass
                    else:
                        authorization = authorization.split(':')
                        if len(authorization) == 2:
                            env['REMOTE_USER'] = authorization[0]
        # XXX REMOTE_IDENT
        if self.headers.typeheader is None:
            env['CONTENT_TYPE'] = self.headers.type
        else:
            env['CONTENT_TYPE'] = self.headers.typeheader
        length = self.headers.getheader('content-length')
        if length:
            env['CONTENT_LENGTH'] = length
        referer = self.headers.getheader('referer')
        if referer:
            env['HTTP_REFERER'] = referer
        accept = []
        for line in self.headers.getallmatchingheaders('accept'):
            if line[:1] in "\t\n\r ":
                accept.append(line.strip())
            else:
                accept = accept + line[7:].split(',')
        env['HTTP_ACCEPT'] = ','.join(accept)
        ua = self.headers.getheader('user-agent')
        if ua:
            env['HTTP_USER_AGENT'] = ua
        co = filter(None, self.headers.getheaders('cookie'))
        if co:
            env['HTTP_COOKIE'] = ', '.join(co)
        # XXX Other HTTP_* headers
        # Since we're setting the env in the parent, provide empty
        # values to override previously set values
        for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH',
                  'HTTP_USER_AGENT', 'HTTP_COOKIE', 'HTTP_REFERER'):
            env.setdefault(k, "")
        os.environ.update(env)

This could be substantially simplified further, but not without spending some time and energy on that task:-(.

With this populenv function at hand, we can recode:

def do_POST(self):
    populen(self)
    form = cgi.FieldStorage(fp=self.rfile)
    upfilecontent = form['upfile'].value
    if upfilecontent:
        fout = open(os.path.join('tmp', form['upfile'].filename), 'wb')
        fout.write(upfilecontent)
        fout.close()
    self.do_GET()

...and live happily ever after;-). (Of course, using any decent WSGI server, or even the demo one, would be much easier, but this exercise is instructive about CGI and its internals;-).

回复收藏 0 原文

落墨 2024-09-19 16:23:44

通过使用 cgi.FieldStorage 您可以轻松提取文件名。检查下面的示例：

def do_POST(self):
    ctype, pdict = cgi.parse_header(self.headers.getheader('content-type'))
    if ctype == 'multipart/form-data':
        form = cgi.FieldStorage( fp=self.rfile, headers=self.headers, environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':self.headers['Content-Type'], })
        filename = form['upfile'].filename
        data = form['upfile'].file.read()
        open("./%s"%filename, "wb").write(data)
    self.do_GET()

By using cgi.FieldStorage you can easily extract the filename. Check the example below:

def do_POST(self):
    ctype, pdict = cgi.parse_header(self.headers.getheader('content-type'))
    if ctype == 'multipart/form-data':
        form = cgi.FieldStorage( fp=self.rfile, headers=self.headers, environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':self.headers['Content-Type'], })
        filename = form['upfile'].filename
        data = form['upfile'].file.read()
        open("./%s"%filename, "wb").write(data)
    self.do_GET()

回复收藏 0 原文

无法回应 2024-09-19 16:23:44

...或者使用您自己的 cgi.parse_multipart 版本，特别是修复此问题：

    # my fix: prefer 'filename' over 'name' field!
    if 'filename' in params:
        name = params['filename']
        name = os.path.basename(name) # Edge, IE return abs path!
    elif 'name' in params:
        name = params['name']
    else:
        continue

...or use your own version of cgi.parse_multipart, especially fixing this:

    # my fix: prefer 'filename' over 'name' field!
    if 'filename' in params:
        name = params['filename']
        name = os.path.basename(name) # Edge, IE return abs path!
    elif 'name' in params:
        name = params['name']
    else:
        continue

回复收藏 0 原文

~没有更多了~