Scrapy CoinMarketCap:如何从第一页上刮擦和获取信息,滚动其他内容并根据过滤器汇总信息?
我是零工和Python的新手,尽管他们有一个API,但我正在进行一个研究CoinMarketCap网站的项目。我有一些问题。
问题1-如何保存第一页的信息以及我将要在同一文件中浏览的页面(scraper crawl -o cmc.csv)?
结果我想要这样的CSV: |名称|价格|监视列表| | ------- | ------ | -------------- | |第一|行| 3.054.863 | |第二|行| 2.056.312 |
我想浏览第一页: https://coinmarketcap.com/ 通知我要分析(或全部)等级中有多少个硬币。并获取一些信息。
import scrapy
class CmcSpider(scrapy.Spider):
name = 'cmc'
start_urls = ['http://coinmarketcap.com/']
def parse(self, response):
for coin in response.css('td:nth-child(3)'):
name_coin = coin.css('td:nth-child(3) p::text').get()
price = coin.css('td:nth-child(4) span::text').get()
vol_24h = coin.css('td:nth-child(5) span::text').get()
yield {
"name": name_coin, "price": price, "vol_24H": vol_24h
}
for item in response.css("tbody tr"):
url = item.css("td:nth-child(3) a::attr(href)").get()
yield scrapy.Request(url=f'http://coinmarketcap.com/{url}', callback=self.parse_currency)
def parse_currency(self, response):
name_second = response.css('.sc-103s2w8-0.eAmmwa span::text').get()
yield {
'name': name_second,
}
正在构建的代码:
问题2:我无法将“注意清单上的信息”分开。我的片段: wathclists = response.css('。bilthz')。get()
返回:
< div display =“ flex” style =“ flex-wrap:wrap” class =“ sc-16r8icm-0 bilthz”><< div class =“ namepill namepill namepill namepill” div>< div class =“ namepill”样式=“ text-transform:Capitalize”> coin</div>>>> div class =“ namepill”> on 3,303,992守望清单代码>
我无法仅访问监视列表信息。
问题3-当我运行零食外壳上的代码时,它只会获取前16个项目
I'm a newbie in scrapy and python, although they have an API I'm doing a project to study the coinmarketcap site. I have some problems.
Question 1 - How to save the information of the first page and the pages that I'm going to go through in the same file (scrapy crawl -O cmc.csv)?
Result I want a csv like this:
| Name | Price | Watchlists |
| ------ | ------ |------------|
| First | row | 3.054.863 |
| Second | row | 2.056.312 |
I want to go through the first page: https://coinmarketcap.com/
Inform how many coins of the rank I want to analyze (or all). And get some information.
import scrapy
class CmcSpider(scrapy.Spider):
name = 'cmc'
start_urls = ['http://coinmarketcap.com/']
def parse(self, response):
for coin in response.css('td:nth-child(3)'):
name_coin = coin.css('td:nth-child(3) p::text').get()
price = coin.css('td:nth-child(4) span::text').get()
vol_24h = coin.css('td:nth-child(5) span::text').get()
yield {
"name": name_coin, "price": price, "vol_24H": vol_24h
}
for item in response.css("tbody tr"):
url = item.css("td:nth-child(3) a::attr(href)").get()
yield scrapy.Request(url=f'http://coinmarketcap.com/{url}', callback=self.parse_currency)
def parse_currency(self, response):
name_second = response.css('.sc-103s2w8-0.eAmmwa span::text').get()
yield {
'name': name_second,
}
Code under construction:
Question 2: I'm not able to separate the information "on watchlists". My snippet:wathclists = response.css('.bILTHz').get()
Returns:
<div display="flex" style="flex-wrap:wrap" class="sc-16r8icm-0 bILTHz"><div class="namePill namePillPrimary">Rank #1</div><div class="namePill " style="text-transform:capitalize">Coin</div><div class="namePill">On 3,303,992 watchlists</div></div>
I'm not able to access only the watchlist information.
Question 3 - when i run the code on scrapy shell it only fetches the first 16 items
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论