文章来源于网络收集而来,版权归原创者所有,如有侵权请及时联系!
12.3 MongoDB
MongoDB是一个面向文档的非关系型数据库(NoSQL),它功能强大、灵活、易于拓展,近年来在多个领域得到广泛应用。
在Python中可以使用第三方库pymongo访问MongoDB数据库,使用pip安装pymongo:
$ sudo pip install pymongo
下面是使用pymongo将数据写入MongoDB数据库的简单示例:
from pymongo import MongoClient # 连接MongoDB,得到一个客户端对象 client = MongoClient('mongodb://localhost:27017') # 获取名为scrapy_db的数据库的对象 db = client.scrapy_db # 获取名为person的集合的对象 collection = db.person doc = { 'name': '刘硕', 'age': 34, 'sex': 'M', } # 将文档插入集合 collection.insert_one(doc) # 关闭客户端 client.close()
仿照SQLitePipeline实现MongoDBPipeline,代码如下:
from pymongo import MongoClient from scrapy import Item class MongoDBPipeline: def open_spider(self, spider): db_uri = spider.settings.get('MONGODB_URI', 'mongodb://localhost:27017') db_name = spider.settings.get('MONGODB_DB_NAME', 'scrapy_default') self.db_client = MongoClient('mongodb://localhost:27017') self.db = self.db_client[db_name] def close_spider(self, spider): self.db_client.close() def process_item(self, item, spider): self.insert_db(item) return item def insert_db(self, item): if isinstance(item, Item): item = dict(item) self.db.books.insert_one(item)
解释上述代码如下:
open_spider方法在开始爬取数据之前被调用,在该方法中通过spider.settings对象读取用户在配置文件中指定的数据库,然后建立与数据库的连接,将得到的MongoClient对象和Database对象分别赋值给self.db_client和self.db,以便之后使用。
process_item方法处理爬取到的每一项数据,在该方法中调用insert_db方法,执行数据库的插入操作。在insert_db方法中,先将一项数据转换成字典,然后调用insert_one方法将其插入集合books。
close_spider方法在爬取完全部数据后被调用,在该方法中关闭与数据库的连接。
在配置文件settings.py中指定我们所要使用的MongoDB数据库,并启用MongoDBPipeline:
MONGODB_URI = 'mongodb://localhost:27017' MONGODB_DB_NAME = 'scrapy_db' ITEM_PIPELINES = { 'toscrape_book.pipelines.MongoDBPipeline': 403, }
运行爬虫,并查看数据库:
$ scrapy crawl books ... $ mongo scrapy_db MongoDB shell version: 2.4.9 connecting to: scrapy_db > db.books.count() 1000 > db.books.find() { "_id" : ObjectId("58fb48859dcd1928b736ee4f"), "review_rating" : 3, "review_num" : "0", "stock" : "22", "upc" : "a897fe39b1053632", "price" : "£51.77", "name" : "A Light in the Attic" } { "_id" : ObjectId("58fb48859dcd1928b736ee50"), "review_rating" : 1, "review_num" : "0", "stock" : "19", "upc" : "feb7cc7701ecf901", "price" : "£23.88", "name" : "Olio" } { "_id" : ObjectId("58fb48859dcd1928b736ee51"), "review_rating" : 2, "review_num" : "0", "stock" : "19", "upc" : "a18a4f574854aced", "price" : "£51.33", "name" : "Libertarianism for Beginners" } { "_id" : ObjectId("58fb48859dcd1928b736ee52"), "review_rating" : 1, "review_num" : "0", "stock" : "19", "upc" : "e30f54cea9b38190", "price" : "£37.59", "name" : "Mesaerion: The Best Science Fiction Stories 1800-1849" } { "_id" : ObjectId("58fb48859dcd1928b736ee53"), "review_rating" : 5, "review_num" : "0", "stock" : "19", "upc" : "a34ba96d4081e6a4", "price" : "£35.02", "name" : "Rip it Up and Start Again" } { "_id" : ObjectId("58fb48859dcd1928b736ee54"), "review_rating" : 2, "review_num" : "0", "stock" : "19", "upc" : "a22124811bfa8350", "price" : "£45.17", "name" : "It's Only the Himalayas" } { "_id" : ObjectId("58fb48859dcd1928b736ee55"), "review_rating" : 5, "review_num" : "0", "stock" : "19", "upc" : "3b1c02bac2a429e6", "price" : "£52.29", "name" : "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)" } { "_id" : ObjectId("58fb48859dcd1928b736ee56"), "review_rating" : 3, "review_num" : "0", "stock" : "19", "upc" : "deda3e61b9514b83", "price" : "£57.25", "name" : "Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991" } { "_id" : ObjectId("58fb48869dcd1928b736ee57"), "review_rating" : 5, "review_num" : "0", "stock" : "19", "upc" : "ce6396b0f23f6ecc", "price" : "£17.46", "name" : "Set Me Free" } { "_id" : ObjectId("58fb48869dcd1928b736ee58"), "review_rating" : 4, "review_num" : "0", "stock" : "19", "upc" : "30a7f60cd76ca58c", "price" : "£20.66", "name" : "Shakespeare's Sonnets" } { "_id" : ObjectId("58fb48869dcd1928b736ee59"), "review_rating" : 2, "review_num" : "0", "stock" : "19", "upc" : "0312262ecafa5a40", "price" : "£13.99", "name" : "Starving Hearts (Triangular Trade Trilogy, #1)" } { "_id" : ObjectId("58fb48869dcd1928b736ee5a"), "review_rating" : 1, "review_num" : "0", "stock" : "19", "upc" : "1dfe412b8ac00530", "price" : "£52.15", "name" : "The Black Maria" } { "_id" : ObjectId("58fb48869dcd1928b736ee5b"), "review_rating" : 4, "review_num" : "0", "stock" : "19", "upc" : "e10e1e165dc8be4a", "price" : "£22.60", "name" : "The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics" } { "_id" : ObjectId("58fb48869dcd1928b736ee5c"), "review_rating" : 1, "review_num" : "0", "stock" : "19", "upc" : "f77dbf2323deb740", "price" : "£22.65", "name" : "The Requiem Red" } { "_id" : ObjectId("58fb48869dcd1928b736ee5d"), "review_rating" : 4, "review_num" : "0", "stock" : "19", "upc" : "2597b5a345f45e1b", "price" : "£33.34", "name" : "The Dirty Little Secrets of Getting Your Dream Job" } { "_id" : ObjectId("58fb48869dcd1928b736ee5e"), "review_rating" : 3, "review_num" : "0", "stock" : "19", "upc" : "e72a5dfc7e9267b2", "price" : "£17.93", "name" : "The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull" } { "_id" : ObjectId("58fb48869dcd1928b736ee5f"), "review_rating" : 5, "review_num" : "0", "stock" : "20", "upc" : "4165285e1663650f", "price" : "£54.23", "name" : "Sapiens: A Brief History of Humankind" } { "_id" : ObjectId("58fb48869dcd1928b736ee60"), "review_rating" : 4, "review_num" : "0", "stock" : "20", "upc" : "e00eb4fd7b871a48", "price" : "£47.82", "name" : "Sharp Objects" } { "_id" : ObjectId("58fb48869dcd1928b736ee61"), "review_rating" : 1, "review_num" : "0", "stock" : "20", "upc" : "90fa61229261140a", "price" : "£53.74", "name" : "Tipping the Velvet" } { "_id" : ObjectId("58fb48869dcd1928b736ee62"), "review_rating" : 1, "review_num" : "0", "stock" : "20", "upc" : "6957f44c3847a760", "price" : "£50.10", "name" : "Soumission" } Type "it" for more >
结果表明,我们成功地将1000条数据存储到了MongoDB数据库。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论