Scrapy CoinMarketCap：如何从第一页上刮擦和获取信息，滚动其他内容并根据过滤器汇总信息？

发布于 2025-02-11 03:21:17 字数 1803 浏览 0 评论 0原文

我是零工和Python的新手，尽管他们有一个API，但我正在进行一个研究CoinMarketCap网站的项目。我有一些问题。

问题1-如何保存第一页的信息以及我将要在同一文件中浏览的页面（scraper crawl -o cmc.csv）？

结果我想要这样的CSV： |名称|价格|监视列表| | ------- | ------ | -------------- | |第一|行| 3.054.863 | |第二|行| 2.056.312 |

我想浏览第一页： https://coinmarketcap.com/ 通知我要分析（或全部）等级中有多少个硬币。并获取一些信息。

import scrapy


class CmcSpider(scrapy.Spider):
    name = 'cmc'
    start_urls = ['http://coinmarketcap.com/']

    def parse(self, response):
        for coin in response.css('td:nth-child(3)'):
            name_coin = coin.css('td:nth-child(3) p::text').get()
            price = coin.css('td:nth-child(4) span::text').get()
            vol_24h = coin.css('td:nth-child(5) span::text').get()

            yield {
                "name": name_coin, "price": price, "vol_24H": vol_24h
            }

        for item in response.css("tbody tr"):
            url = item.css("td:nth-child(3) a::attr(href)").get()
            yield scrapy.Request(url=f'http://coinmarketcap.com/{url}', callback=self.parse_currency)

    def parse_currency(self, response):
        name_second = response.css('.sc-103s2w8-0.eAmmwa span::text').get()

        yield {
            'name': name_second,
        }

正在构建的代码：

问题2：我无法将“注意清单上的信息”分开。我的片段： wathclists = response.css（'。bilthz'）。get（） 返回：

＆lt; div display =“ flex” style =“ flex-wrap：wrap” class =“ sc-16r8icm-0 bilthz”＆gt;＆lt;＆lt; div class =“ namepill namepill namepill namepill” div＆gt;＆lt; div class =“ namepill”样式=“ text-transform：Capitalize”＆gt; coin＆lt;/div＆gt;＆gt;＆gt;＆gt; div class =“ namepill”＆gt; on 3,303,992守望清单代码>

我无法仅访问监视列表信息。

问题3-当我运行零食外壳上的代码时，它只会获取前16个项目

原文

I'm a newbie in scrapy and python, although they have an API I'm doing a project to study the coinmarketcap site. I have some problems.

Question 1 - How to save the information of the first page and the pages that I'm going to go through in the same file (scrapy crawl -O cmc.csv)?

Result I want a csv like this:
| Name | Price | Watchlists |
| ------ | ------ |------------|
| First | row | 3.054.863 |
| Second | row | 2.056.312 |

I want to go through the first page: https://coinmarketcap.com/
Inform how many coins of the rank I want to analyze (or all). And get some information.

import scrapy


class CmcSpider(scrapy.Spider):
    name = 'cmc'
    start_urls = ['http://coinmarketcap.com/']

    def parse(self, response):
        for coin in response.css('td:nth-child(3)'):
            name_coin = coin.css('td:nth-child(3) p::text').get()
            price = coin.css('td:nth-child(4) span::text').get()
            vol_24h = coin.css('td:nth-child(5) span::text').get()

            yield {
                "name": name_coin, "price": price, "vol_24H": vol_24h
            }

        for item in response.css("tbody tr"):
            url = item.css("td:nth-child(3) a::attr(href)").get()
            yield scrapy.Request(url=f'http://coinmarketcap.com/{url}', callback=self.parse_currency)

    def parse_currency(self, response):
        name_second = response.css('.sc-103s2w8-0.eAmmwa span::text').get()

        yield {
            'name': name_second,
        }

Code under construction:

Question 2: I'm not able to separate the information "on watchlists". My snippet:
wathclists = response.css('.bILTHz').get()
Returns:

<div display="flex" style="flex-wrap:wrap" class="sc-16r8icm-0 bILTHz"><div class="namePill namePillPrimary">Rank #1</div><div class="namePill " style="text-transform:capitalize">Coin</div><div class="namePill">On 3,303,992 watchlists</div></div>

I'm not able to access only the watchlist information.

Question 3 - when i run the code on scrapy shell it only fetches the first 16 items

分享到QQ

分享到微博