在管道中使用 POST 请求保留项目

发布于 2024-11-27 18:10:51 字数 679 浏览 0 评论 0原文

我想将项目保留在管道中并将其发布到网址。

我在管道中使用此代码

class XPipeline(object):
def process_item(self, item, spider):     
    log.msg('in SpotifylistPipeline', level=log.DEBUG)   

    yield FormRequest(url="http://www.example.com/additem, formdata={'title': item['title'], 'link': item['link'], 'description': item['description']})

，但它似乎没有发出 http 请求。

是否可以从管道发出http请求？如果没有，我必须在Spider中这样做吗？
我需要指定回调函数吗？如果有，是哪一个？
如果我可以进行 http 调用，我可以检查响应 (JSON) 并在一切正常的情况下返回该项目，或者在未保存的情况下丢弃该项目吗？

最后一件事，是否有一个图表可以解释 Scrapy 从开始到结束所遵循的流程？我有点迷失了什么时候被称为什么。例如，如果管道将物品返回给蜘蛛，蜘蛛会如何处理这些物品？ Pipeline 调用之后会发生什么？

提前非常感

谢米格西

原文

I want to persist items within a Pipeline posting them to a url.

I am using this code within the Pipeline

class XPipeline(object):
def process_item(self, item, spider):     
    log.msg('in SpotifylistPipeline', level=log.DEBUG)   

    yield FormRequest(url="http://www.example.com/additem, formdata={'title': item['title'], 'link': item['link'], 'description': item['description']})

but it seems it's not making the http request.

Is it possible to make http request from pipelines? If not, do I have to do it in the Spider?
Do I need to specify a callback function? If so, which one?
If I can make the http call, can I check the response (JSON) and return the item if everything went ok, or discard the item if it didn't get saved?

As I final thing, is there a diagram that explains the flow that Scrapy follows from beginning to end? I am getting slightly lost which what gets called when. For instance, if Pipelines returned items to Spiders, what do Spiders do with those items? What's after a Pipeline call?

Many thanks in advance

Migsy

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冬天的雪花 2024-12-04 18:10:51

您可以从 scrapy.contrib.pipeline.media.MediaPipeline 继承管道并在“get_media_requests”中产生请求。响应被传递到“media_downloaded”回调中。

回复收藏 0 原文

仙女山的月亮 2024-12-04 18:10:51

引用：

每个项目管道组件都会调用此方法，并且必须
返回一个 Item （或任何后代类）对象或引发一个
DropItem 异常。掉落的物品不再被进一步处理
管道组件。

因此，只有蜘蛛才能产生带有回调的请求。
管道用于处理物品。

你最好描述一下你想要实现什么。

有没有一张图解释Scrapy从头到尾遵循的流程

架构概述

例如，如果管道将项目返回给蜘蛛

管道不会将项目返回给蜘蛛。返回的项目将传递到下一个管道。

回复收藏 0 原文

紙鸢 2024-12-04 18:10:51

这可以通过使用 requests 库轻松完成。如果您不想使用其他库，请查看 urllib2。

import requests

class XPipeline(object):

    def process_item(self, item, spider):       
        r = requests.post("http://www.example.com/additem", data={'title': item['title'], 'link': item['link'], 'description': item['description']})
        if r.status_code == 200:
            return item
        else:
            raise DropItem("Failed to post item with title %s." % item['title'])

This could be done easily by using the requests library. If you don't want to use another library then look into urllib2.

import requests

class XPipeline(object):

    def process_item(self, item, spider):       
        r = requests.post("http://www.example.com/additem", data={'title': item['title'], 'link': item['link'], 'description': item['description']})
        if r.status_code == 200:
            return item
        else:
            raise DropItem("Failed to post item with title %s." % item['title'])

回复收藏 0 原文

~没有更多了~