Scrapy中间件订单

发布于 2024-11-18 23:40:01 字数 608 浏览 2 评论 0原文

Scrapy 文档说：

第一个中间件是最接近的一个发动机，最后一个更接近到下载器。
决定分配给哪个订单你的中间件看到 DOWNLOADER_MIDDLEWARES_BASE 设置并根据位置选择一个值你想插入中间件。这顺序很重要，因为每个中间件执行不同的操作你的中间件可能依赖于一些先前的（或后续的）正在应用的中间件

我并不完全清楚较高的值是否会导致中间件首先被执行，反之亦然。

例如

'myproject.middlewares.MW1': 543,
'myproject.middlewares.MW2': 542,

问题：

其中哪一个将首先执行？我的试验表明，MW2 将是第一。
订单的有效范围是多少？ 0 - 999？

原文

Scrapy documentation says :

the first
middleware is the one closer to the
engine and the last is the one closer
to the downloader.
To decide which order to assign to
your middleware see the
DOWNLOADER_MIDDLEWARES_BASE setting
and pick a value according to where
you want to insert the middleware. The
order does matter because each
middleware performs a different action
and your middleware could depend on
some previous (or subsequent)
middleware being applied

I'm not entirely clear from this whether a higher value would result in a middleware
getting executed first or vice versa.

E.g.

'myproject.middlewares.MW1': 543,
'myproject.middlewares.MW2': 542,

Question :

Which of these will be executed first? My trial says that MW2 would be first.
What's the valid range for the orders ? 0 - 999 ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清眉祭 2024-11-25 23:40:01

我知道这个问题已经得到解答，但实际上这是一件更复杂的事情——请求和响应以相反的顺序处理。

你可以这样想：

0 - 引擎发出请求
1..inf - process_request 中间件调用
inf - 实际下载发生（如果请求中间件没有处理它）
inf..1 - process_resonse 中间件调用
0 - 收到的响应引擎

所以...如果我将我的中间件标记为数字1，它将是执行的第一个请求中间件和执行的最后一个响应中间件...如果我的中间件为901，它将是最后一个请求执行的中间件和执行的 FIRST 响应中间件（如果仅定义了默认中间件）。

事实上，答案是令人困惑。请求的开始位置最接近引擎（零），请求的结束位置最接近下载器（高数字）。响应的开始最接近下载器（高数字），响应的结束最接近引擎（零）。这就像从引擎中往返一样...这是来自 scrapy 的相关代码，使这一切变得如此有趣（从 MiddlewareManager 复制的 init 供参考，仅包含相关方法）

class DownloaderMiddlewareManager(MiddlewareManager):
    def __init__(self, *middlewares):
        self.middlewares = middlewares
        self.methods = defaultdict(list)
        for mw in middlewares:
            self._add_middleware(mw)

    def _add_middleware(self, mw):
        if hasattr(mw, 'process_request'):
            self.methods['process_request'].append(mw.process_request)
        if hasattr(mw, 'process_response'):
            self.methods['process_response'].insert(0, mw.process_response)
        if hasattr(mw, 'process_exception'):
            self.methods['process_exception'].insert(0, mw.process_exception)

：可以看到，请求方法按排序顺序附加（数字较大的添加到后面），响应和异常方法插入在开头（数字较大的在前面）。

I know this has been answered, but really it's a more complicated thing -- requests and responses are handled in opposite order.

you can think of it like this:

0 - engine makes request
1..inf - process_request middleware calls
inf - actual download happens (if a request middleware didn't handle it)
inf..1 - process_resonse middleware calls
0 - response received by the engine

so ... if i tag my middleware as number 1 it will be the FIRST request middleware executed and the LAST response middleware executed ... if my middleware as 901 it will be the LAST request middleware executed and the FIRST response middleware executed (if only the default middleware is defined).

really the answer is that it IS confusing. the start of the request is nearest the engine (at zero) and the end of the request is nearest the downloader (high number). the start of the response is nearest the downloader (high number) and the end of the response is nearest the engine (at zero). it's like a trip out and back from the engine ... here's the relevant code from scrapy that makes this all so fun (with init copied from MiddlewareManager for reference and only the relevant method included):

class DownloaderMiddlewareManager(MiddlewareManager):
    def __init__(self, *middlewares):
        self.middlewares = middlewares
        self.methods = defaultdict(list)
        for mw in middlewares:
            self._add_middleware(mw)

    def _add_middleware(self, mw):
        if hasattr(mw, 'process_request'):
            self.methods['process_request'].append(mw.process_request)
        if hasattr(mw, 'process_response'):
            self.methods['process_response'].insert(0, mw.process_response)
        if hasattr(mw, 'process_exception'):
            self.methods['process_exception'].insert(0, mw.process_exception)

As you can see, request methods are appeneded in sorted order (higher number added to the back) and response and exception methods are inserted at the beginning (higher number is first).

回复收藏 0 原文