Scrapy CoinMarketCap:如何从第一页上刮擦和获取信息,滚动其他内容并根据过滤器汇总信息?

发布于 2025-02-11 03:21:17 字数 1803 浏览 0 评论 0原文

我是零工和Python的新手,尽管他们有一个API,但我正在进行一个研究CoinMarketCap网站的项目。我有一些问题。

问题1-如何保存第一页的信息以及我将要在同一文件中浏览的页面(scraper crawl -o cmc.csv)?

结果我想要这样的CSV: |名称|价格|监视列表| | ------- | ------ | -------------- | |第一|行| 3.054.863 | |第二|行| 2.056.312 |

我想浏览第一页: https://coinmarketcap.com/ 通知我要分析(或全部)等级中有多少个硬币。并获取一些信息。

import scrapy


class CmcSpider(scrapy.Spider):
    name = 'cmc'
    start_urls = ['http://coinmarketcap.com/']

    def parse(self, response):
        for coin in response.css('td:nth-child(3)'):
            name_coin = coin.css('td:nth-child(3) p::text').get()
            price = coin.css('td:nth-child(4) span::text').get()
            vol_24h = coin.css('td:nth-child(5) span::text').get()

            yield {
                "name": name_coin, "price": price, "vol_24H": vol_24h
            }

        for item in response.css("tbody tr"):
            url = item.css("td:nth-child(3) a::attr(href)").get()
            yield scrapy.Request(url=f'http://coinmarketcap.com/{url}', callback=self.parse_currency)

    def parse_currency(self, response):
        name_second = response.css('.sc-103s2w8-0.eAmmwa span::text').get()

        yield {
            'name': name_second,
        }

正在构建的代码:

问题2:我无法将“注意清单上的信息”分开。我的片段: wathclists = response.css('。bilthz')。get() 返回:

< div display =“ flex” style =“ flex-wrap:wrap” class =“ sc-16r8icm-0 bilthz”><< div class =“ namepill namepill namepill namepill” div>< div class =“ namepill”样式=“ text-transform:Capitalize”> coin</div>>>> div class =“ namepill”> on 3,303,992守望清单代码>

我无法仅访问监视列表信息。

问题3-当我运行零食外壳上的代码时,它只会获取前16个项目

I'm a newbie in scrapy and python, although they have an API I'm doing a project to study the coinmarketcap site. I have some problems.

Question 1 - How to save the information of the first page and the pages that I'm going to go through in the same file (scrapy crawl -O cmc.csv)?

Result I want a csv like this:
| Name | Price | Watchlists |
| ------ | ------ |------------|
| First | row | 3.054.863 |
| Second | row | 2.056.312 |

I want to go through the first page: https://coinmarketcap.com/
Inform how many coins of the rank I want to analyze (or all). And get some information.

import scrapy


class CmcSpider(scrapy.Spider):
    name = 'cmc'
    start_urls = ['http://coinmarketcap.com/']

    def parse(self, response):
        for coin in response.css('td:nth-child(3)'):
            name_coin = coin.css('td:nth-child(3) p::text').get()
            price = coin.css('td:nth-child(4) span::text').get()
            vol_24h = coin.css('td:nth-child(5) span::text').get()

            yield {
                "name": name_coin, "price": price, "vol_24H": vol_24h
            }

        for item in response.css("tbody tr"):
            url = item.css("td:nth-child(3) a::attr(href)").get()
            yield scrapy.Request(url=f'http://coinmarketcap.com/{url}', callback=self.parse_currency)

    def parse_currency(self, response):
        name_second = response.css('.sc-103s2w8-0.eAmmwa span::text').get()

        yield {
            'name': name_second,
        }

Code under construction:

Question 2: I'm not able to separate the information "on watchlists". My snippet:
wathclists = response.css('.bILTHz').get()
Returns:

<div display="flex" style="flex-wrap:wrap" class="sc-16r8icm-0 bILTHz"><div class="namePill namePillPrimary">Rank #1</div><div class="namePill " style="text-transform:capitalize">Coin</div><div class="namePill">On 3,303,992 watchlists</div></div>

I'm not able to access only the watchlist information.

Question 3 - when i run the code on scrapy shell it only fetches the first 16 items

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文