文章来源于网络收集而来，版权归原创者所有，如有侵权请及时联系！

第 11 章 HTTP Web 服务

发布于 2019-09-14 13:30:41 字数 6549 浏览 888 评论 0 收藏 0

第 11 章 HTTP Web 服务

11.1. 概览
11.2. 避免通过 HTTP 重复地获取数据
11.3. HTTP 的特性
- 11.3.1. 用户代理 (User-Agent)
- 11.3.2. 重定向 (Redirects)
- 11.3.3. Last-Modified/If-Modified-Since
- 11.3.4. ETag/If-None-Match
- 11.3.5. 压缩 (Compression)
11.4. 调试 HTTP web 服务
11.5. 设置 User-Agent
11.6. 处理 Last-Modified 和 ETag
11.7. 处理重定向
11.8. 处理被压缩的数据
11.9. 全部放在一起
11.10. 小结

11.1. 概览

在讲解如何下载 web 页和如何从 URL 解析 XML时, 你已经学习了关于 HTML 处理和 XML 处理，接下来让我们来更全面地探讨有关 HTTP web 服务的主题。

简单地讲, HTTP web 服务是指直接使用 HTTP 操作从远程服务器按部就班地发送和接收数据。如果你要从服务器获取数据, 直接使用 HTTP GET; 如果您想发送新数据到服务器, 使用 HTTP POST。(一些较高级的 HTTP web 服务 API 也定义了使用 HTTP PUT 和 HTTP DELETE 修改和删除现有数据的方法。) 换句话说, 构建在 HTTP 协议中的 “verbs（动作）” (GET, POST, PUT 和 DELETE) 直接映射为接收, 发送, 修改和删除等应用级别的操作。

利用这种方法的要点是简单的，并且许多不同的站点充分印证了这样的简单性是受欢迎的。数据（通常是 XML 数据）能静态创建和存储, 或通过服务器端脚本和所有主流计算机语言（包括用于下载数据的 HTTP 库）动态生成。调试也很简单, 因为您可以在任意浏览器中调用网络服务来查看这些原始数据。现代浏览器甚至可以为您进行良好地格式化并漂亮地打印这些 XML 数据, 以便让您快速地浏览。

HTTP web 服务上的纯 XML 应用举例:

Amazon API 允许您从 Amazon.com 在线商店获取产品信息。
National Weather Service (美国) 和 Hong Kong Observatory (香港) 通过 web 服务提供天气警报。
Atom API 用来管理基于 web 的内容。
Syndicated feeds 应用于 weblogs 和新闻站点中带给您来自众多站点的最新消息。

在后面的几章里, 我们将探索使用 HTTP 做数据发送和接收传输的 API, 但是不会将应用语义映射到潜在的 HTTP 语义。 (所有这些都是通过 HTTP POST 这个管道完成的。) 但是本章将关注使用 HTTP GET 从远程服务器获取数据, 并且将探索几个由纯 HTTP web 服务带来最大利益的 HTTP 特性。

如下所示为上一章曾经看到过的 openanything 模块的更高级版本 :

例 11.1. openanything.py

如果您还没有下载本书附带的例子程序, 可以下载本程序和其他例子程序。

import urllib2, urlparse, gzip
from StringIO import StringIO
USER_AGENT = 'OpenAnything/1.0 +http://diveintopython.org/http_web_services/'
class SmartRedirectHandler(urllib2.HTTPRedirectHandler):    
    def http_error_301(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_301(
            self, req, fp, code, msg, headers)              
        result.status = code                                
        return result                                       
    def http_error_302(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_302(
            self, req, fp, code, msg, headers)              
        result.status = code                                
        return result                                       
class DefaultErrorHandler(urllib2.HTTPDefaultErrorHandler):   
    def http_error_default(self, req, fp, code, msg, headers):
        result = urllib2.HTTPError(                           
            req.get_full_url(), code, msg, headers, fp)       
        result.status = code                                  
        return result                                         
def openAnything(source, etag=None, lastmodified=None, agent=USER_AGENT):
    '''URL, filename, or string --> stream
    This function lets you define parsers that take any input source
    (URL, pathname to local or network file, or actual data as a string)
    and deal with it in a uniform manner.  Returned object is guaranteed
    to have all the basic stdio read methods (read, readline, readlines).
    Just .close() the object when you're done with it.
    If the etag argument is supplied, it will be used as the value of an
    If-None-Match request header.
    If the lastmodified argument is supplied, it must be a formatted
    date/time string in GMT (as returned in the Last-Modified header of
    a previous request).  The formatted date/time will be used
    as the value of an If-Modified-Since request header.
    If the agent argument is supplied, it will be used as the value of a
    User-Agent request header.
    '''
    if hasattr(source, 'read'):
        return source
    if source == '-':
        return sys.stdin
    if urlparse.urlparse(source)[0] == 'http':                                      
        # open URL with urllib2                                                     
        request = urllib2.Request(source)                                           
        request.add_header('User-Agent', agent)                                     
        if etag:                                                                    
            request.add_header('If-None-Match', etag)                               
        if lastmodified:                                                            
            request.add_header('If-Modified-Since', lastmodified)                   
        request.add_header('Accept-encoding', 'gzip')                               
        opener = urllib2.build_opener(SmartRedirectHandler(), DefaultErrorHandler())
        return opener.open(request)                                                 
    
    # try to open with native open function (if source is a filename)
    try:
        return open(source)
    except (IOError, OSError):
        pass
    # treat source as string
    return StringIO(str(source))
def fetch(source, etag=None, last_modified=None, agent=USER_AGENT):  
    '''Fetch data and metadata from a URL, file, stream, or string'''
    result = {}                                                      
    f = openAnything(source, etag, last_modified, agent)             
    result['data'] = f.read()                                        
    if hasattr(f, 'headers'):                                        
        # save ETag, if the server sent one                          
        result['etag'] = f.headers.get('ETag')                       
        # save Last-Modified header, if the server sent one          
        result['lastmodified'] = f.headers.get('Last-Modified')      
        if f.headers.get('content-encoding', '') == 'gzip':          
            # data came back gzip-compressed, decompress it          
            result['data'] = gzip.GzipFile(fileobj=StringIO(result['data']])).read()
    if hasattr(f, 'url'):                                            
        result['url'] = f.url                                        
        result['status'] = 200                                       
    if hasattr(f, 'status'):                                         
        result['status'] = f.status                                  
    f.close()                                                        
    return result