在管道中使用 POST 请求保留项目
我想将项目保留在管道中并将其发布到网址。
我在管道中使用此代码
class XPipeline(object):
def process_item(self, item, spider):
log.msg('in SpotifylistPipeline', level=log.DEBUG)
yield FormRequest(url="http://www.example.com/additem, formdata={'title': item['title'], 'link': item['link'], 'description': item['description']})
,但它似乎没有发出 http 请求。
- 是否可以从管道发出http请求?如果没有,我必须在Spider中这样做吗?
- 我需要指定回调函数吗?如果有,是哪一个?
- 如果我可以进行 http 调用,我可以检查响应 (JSON) 并在一切正常的情况下返回该项目,或者在未保存的情况下丢弃该项目吗?
最后一件事,是否有一个图表可以解释 Scrapy 从开始到结束所遵循的流程?我有点迷失了什么时候被称为什么。例如,如果管道将物品返回给蜘蛛,蜘蛛会如何处理这些物品? Pipeline 调用之后会发生什么?
提前非常感
谢米格西
I want to persist items within a Pipeline posting them to a url.
I am using this code within the Pipeline
class XPipeline(object):
def process_item(self, item, spider):
log.msg('in SpotifylistPipeline', level=log.DEBUG)
yield FormRequest(url="http://www.example.com/additem, formdata={'title': item['title'], 'link': item['link'], 'description': item['description']})
but it seems it's not making the http request.
- Is it possible to make http request from pipelines? If not, do I have to do it in the Spider?
- Do I need to specify a callback function? If so, which one?
- If I can make the http call, can I check the response (JSON) and return the item if everything went ok, or discard the item if it didn't get saved?
As I final thing, is there a diagram that explains the flow that Scrapy follows from beginning to end? I am getting slightly lost which what gets called when. For instance, if Pipelines returned items to Spiders, what do Spiders do with those items? What's after a Pipeline call?
Many thanks in advance
Migsy
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以从 scrapy.contrib.pipeline.media.MediaPipeline 继承管道并在“get_media_requests”中产生请求。响应被传递到“media_downloaded”回调中。
You can inherit your pipeline from scrapy.contrib.pipeline.media.MediaPipeline and yield Requests in 'get_media_requests'. Responses are passed into 'media_downloaded' callback.
引用:
因此,只有蜘蛛才能产生带有回调的请求。
管道用于处理物品。
你最好描述一下你想要实现什么。
架构概述
管道不会将项目返回给蜘蛛。返回的项目将传递到下一个管道。
Quote:
So, only spider can yield a request with a callback.
Pipelines are used for processing items.
You better describe what do you want to achieve.
Architecture overview
Pipelines do not return items to spiders. The items returned are passed to the next pipeline.
这可以通过使用 requests 库轻松完成。如果您不想使用其他库,请查看 urllib2。
This could be done easily by using the requests library. If you don't want to use another library then look into urllib2.