返回介绍

12.3 MongoDB

发布于 2024-02-05 21:13:20 字数 6089 浏览 0 评论 0 收藏 0

MongoDB是一个面向文档的非关系型数据库(NoSQL),它功能强大、灵活、易于拓展,近年来在多个领域得到广泛应用。

在Python中可以使用第三方库pymongo访问MongoDB数据库,使用pip安装pymongo:

$ sudo pip install pymongo

下面是使用pymongo将数据写入MongoDB数据库的简单示例:

from pymongo import MongoClient
# 连接MongoDB,得到一个客户端对象
client = MongoClient('mongodb://localhost:27017')

# 获取名为scrapy_db的数据库的对象
db = client.scrapy_db

# 获取名为person的集合的对象
collection = db.person

doc = {
 'name': '刘硕',
 'age': 34,
 'sex': 'M',
}

# 将文档插入集合
collection.insert_one(doc)

# 关闭客户端
client.close()

仿照SQLitePipeline实现MongoDBPipeline,代码如下:

from pymongo import MongoClient
from scrapy import Item

class MongoDBPipeline:
 def open_spider(self, spider):
  db_uri = spider.settings.get('MONGODB_URI', 'mongodb://localhost:27017')
  db_name = spider.settings.get('MONGODB_DB_NAME', 'scrapy_default')

  self.db_client = MongoClient('mongodb://localhost:27017')
  self.db = self.db_client[db_name]

 def close_spider(self, spider):
  self.db_client.close()

 def process_item(self, item, spider):
  self.insert_db(item)
 return item

def insert_db(self, item):
 if isinstance(item, Item):
  item = dict(item)

 self.db.books.insert_one(item)

解释上述代码如下:

open_spider方法在开始爬取数据之前被调用,在该方法中通过spider.settings对象读取用户在配置文件中指定的数据库,然后建立与数据库的连接,将得到的MongoClient对象和Database对象分别赋值给self.db_client和self.db,以便之后使用。

process_item方法处理爬取到的每一项数据,在该方法中调用insert_db方法,执行数据库的插入操作。在insert_db方法中,先将一项数据转换成字典,然后调用insert_one方法将其插入集合books。

close_spider方法在爬取完全部数据后被调用,在该方法中关闭与数据库的连接。

在配置文件settings.py中指定我们所要使用的MongoDB数据库,并启用MongoDBPipeline:

MONGODB_URI = 'mongodb://localhost:27017'
MONGODB_DB_NAME = 'scrapy_db'

ITEM_PIPELINES = {
  'toscrape_book.pipelines.MongoDBPipeline': 403,
}

运行爬虫,并查看数据库:

$ scrapy crawl books
...
$ mongo scrapy_db
MongoDB shell version: 2.4.9
connecting to: scrapy_db
> db.books.count()
1000
> db.books.find()
  { "_id" : ObjectId("58fb48859dcd1928b736ee4f"), "review_rating" : 3, "review_num" : "0", "stock" :
"22", "upc" : "a897fe39b1053632", "price" : "£51.77", "name" : "A Light in the Attic" }
  { "_id" : ObjectId("58fb48859dcd1928b736ee50"), "review_rating" : 1, "review_num" : "0", "stock" :
"19", "upc" : "feb7cc7701ecf901", "price" : "£23.88", "name" : "Olio" }
  { "_id" : ObjectId("58fb48859dcd1928b736ee51"), "review_rating" : 2, "review_num" : "0", "stock" :
"19", "upc" : "a18a4f574854aced", "price" : "£51.33", "name" : "Libertarianism for Beginners" }
  { "_id" : ObjectId("58fb48859dcd1928b736ee52"), "review_rating" : 1, "review_num" : "0", "stock" :
"19", "upc" : "e30f54cea9b38190", "price" : "£37.59", "name" : "Mesaerion: The Best Science Fiction
Stories 1800-1849" }
  { "_id" : ObjectId("58fb48859dcd1928b736ee53"), "review_rating" : 5, "review_num" : "0", "stock" :
"19", "upc" : "a34ba96d4081e6a4", "price" : "£35.02", "name" : "Rip it Up and Start Again" }
  { "_id" : ObjectId("58fb48859dcd1928b736ee54"), "review_rating" : 2, "review_num" : "0", "stock" :
"19", "upc" : "a22124811bfa8350", "price" : "£45.17", "name" : "It's Only the Himalayas" }
  { "_id" : ObjectId("58fb48859dcd1928b736ee55"), "review_rating" : 5, "review_num" : "0", "stock" :
"19", "upc" : "3b1c02bac2a429e6", "price" : "£52.29", "name" : "Scott Pilgrim's Precious Little Life (Scott
Pilgrim #1)" }
  { "_id" : ObjectId("58fb48859dcd1928b736ee56"), "review_rating" : 3, "review_num" : "0", "stock" :
"19", "upc" : "deda3e61b9514b83", "price" : "£57.25", "name" : "Our Band Could Be Your Life: Scenes
from the American Indie Underground, 1981-1991" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee57"), "review_rating" : 5, "review_num" : "0", "stock" :
"19", "upc" : "ce6396b0f23f6ecc", "price" : "£17.46", "name" : "Set Me Free" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee58"), "review_rating" : 4, "review_num" : "0", "stock" :
"19", "upc" : "30a7f60cd76ca58c", "price" : "£20.66", "name" : "Shakespeare's Sonnets" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee59"), "review_rating" : 2, "review_num" : "0", "stock" :
"19", "upc" : "0312262ecafa5a40", "price" : "£13.99", "name" : "Starving Hearts (Triangular Trade Trilogy,
#1)" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee5a"), "review_rating" : 1, "review_num" : "0", "stock" :
"19", "upc" : "1dfe412b8ac00530", "price" : "£52.15", "name" : "The Black Maria" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee5b"), "review_rating" : 4, "review_num" : "0", "stock" :
"19", "upc" : "e10e1e165dc8be4a", "price" : "£22.60", "name" : "The Boys in the Boat: Nine Americans and
Their Epic Quest for Gold at the 1936 Berlin Olympics" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee5c"), "review_rating" : 1, "review_num" : "0", "stock" :
"19", "upc" : "f77dbf2323deb740", "price" : "£22.65", "name" : "The Requiem Red" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee5d"), "review_rating" : 4, "review_num" : "0", "stock" :
"19", "upc" : "2597b5a345f45e1b", "price" : "£33.34", "name" : "The Dirty Little Secrets of Getting Your
Dream Job" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee5e"), "review_rating" : 3, "review_num" : "0", "stock" :
"19", "upc" : "e72a5dfc7e9267b2", "price" : "£17.93", "name" : "The Coming Woman: A Novel Based on
the Life of the Infamous Feminist, Victoria Woodhull" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee5f"), "review_rating" : 5, "review_num" : "0", "stock" :
"20", "upc" : "4165285e1663650f", "price" : "£54.23", "name" : "Sapiens: A Brief History of Humankind" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee60"), "review_rating" : 4, "review_num" : "0", "stock" :
"20", "upc" : "e00eb4fd7b871a48", "price" : "£47.82", "name" : "Sharp Objects" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee61"), "review_rating" : 1, "review_num" : "0", "stock" :
"20", "upc" : "90fa61229261140a", "price" : "£53.74", "name" : "Tipping the Velvet" }
  { "_id" : ObjectId("58fb48869dcd1928b736ee62"), "review_rating" : 1, "review_num" : "0", "stock" :
"20", "upc" : "6957f44c3847a760", "price" : "£50.10", "name" : "Soumission" }
  Type "it" for more
  >

结果表明,我们成功地将1000条数据存储到了MongoDB数据库。

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文