在管道中使用 POST 请求保留项目

发布于 2024-11-27 18:10:51 字数 679 浏览 0 评论 0原文

我想将项目保留在管道中并将其发布到网址。

我在管道中使用此代码

class XPipeline(object):
def process_item(self, item, spider):     
    log.msg('in SpotifylistPipeline', level=log.DEBUG)   

    yield FormRequest(url="http://www.example.com/additem, formdata={'title': item['title'], 'link': item['link'], 'description': item['description']})

,但它似乎没有发出 http 请求。

  • 是否可以从管道发出http请求?如果没有,我必须在Spider中这样做吗?
  • 我需要指定回调函数吗?如果有,是哪一个?
  • 如果我可以进行 http 调用,我可以检查响应 (JSON) 并在一切正常的情况下返回该项目,或者在未保存的情况下丢弃该项目吗?

最后一件事,是否有一个图表可以解释 Scrapy 从开始到结束所遵循的流程?我有点迷失了什么时候被称为什么。例如,如果管道将物品返回给蜘蛛,蜘蛛会如何处理这些物品? Pipeline 调用之后会发生什么?

提前非常感

谢米格西

I want to persist items within a Pipeline posting them to a url.

I am using this code within the Pipeline

class XPipeline(object):
def process_item(self, item, spider):     
    log.msg('in SpotifylistPipeline', level=log.DEBUG)   

    yield FormRequest(url="http://www.example.com/additem, formdata={'title': item['title'], 'link': item['link'], 'description': item['description']})

but it seems it's not making the http request.

  • Is it possible to make http request from pipelines? If not, do I have to do it in the Spider?
  • Do I need to specify a callback function? If so, which one?
  • If I can make the http call, can I check the response (JSON) and return the item if everything went ok, or discard the item if it didn't get saved?

As I final thing, is there a diagram that explains the flow that Scrapy follows from beginning to end? I am getting slightly lost which what gets called when. For instance, if Pipelines returned items to Spiders, what do Spiders do with those items? What's after a Pipeline call?

Many thanks in advance

Migsy

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

冬天的雪花 2024-12-04 18:10:51

您可以从 scrapy.contrib.pipeline.media.MediaPipeline 继承管道并在“get_media_requests”中产生请求。响应被传递到“media_downloaded”回调中。

You can inherit your pipeline from scrapy.contrib.pipeline.media.MediaPipeline and yield Requests in 'get_media_requests'. Responses are passed into 'media_downloaded' callback.

仙女山的月亮 2024-12-04 18:10:51

引用

每个项目管道组件都会调用此方法,并且必须
返回一个 Item (或任何后代类)对象或引发一个
DropItem 异常。掉落的物品不再被进一步处理
管道组件。

因此,只有蜘蛛才能产生带有回调的请求。
管道用于处理物品。

你最好描述一下你想要实现什么。

有没有一张图解释Scrapy从头到尾遵循的流程

架构概述

例如,如果管道将项目返回给蜘蛛

管道不会将项目返回给蜘蛛。返回的项目将传递到下一个管道。

Quote:

This method is called for every item pipeline component and must
either return a Item (or any descendant class) object or raise a
DropItem exception. Dropped items are no longer processed by further
pipeline components.

So, only spider can yield a request with a callback.
Pipelines are used for processing items.

You better describe what do you want to achieve.

is there a diagram that explains the flow that Scrapy follows from beginning to end

Architecture overview

For instance, if Pipelines returned items to Spiders

Pipelines do not return items to spiders. The items returned are passed to the next pipeline.

紙鸢 2024-12-04 18:10:51

这可以通过使用 requests 库轻松完成。如果您不想使用其他库,请查看 urllib2

import requests

class XPipeline(object):

    def process_item(self, item, spider):       
        r = requests.post("http://www.example.com/additem", data={'title': item['title'], 'link': item['link'], 'description': item['description']})
        if r.status_code == 200:
            return item
        else:
            raise DropItem("Failed to post item with title %s." % item['title'])

This could be done easily by using the requests library. If you don't want to use another library then look into urllib2.

import requests

class XPipeline(object):

    def process_item(self, item, spider):       
        r = requests.post("http://www.example.com/additem", data={'title': item['title'], 'link': item['link'], 'description': item['description']})
        if r.status_code == 200:
            return item
        else:
            raise DropItem("Failed to post item with title %s." % item['title'])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文