python scrapy 需要帮助。我想保存到 (.csv) 文件。我该怎么做?

发布于 2025-01-09 04:14:06 字数 753 浏览 1 评论 0原文

我正在使用 debian Bullseye (11.2) 我想保存到 (.csv) 文件。 我该怎么做?

from scrapy.spiders import CSVFeedSpider


class CsSpiderSpider(CSVFeedSpider):
    name = 'cs_spider'
    allowed_domains = ['ocw.mit.edu/courses/electrical-engineering-and-computer-science/']
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science//feed.csv']
    # headers = ['id', 'name', 'description', 'image_link']
    # delimiter = '\t'

    # Do any adaptations you need here
    #def adapt_response(self, response):
    #    return response

    def parse_row(self, response, row):
        i = {}
        #i['url'] = row['url']
        #i['name'] = row['name']
        #i['description'] = row['description']
        return i

I'm using debian Bullseye (11.2)
I want to save to a (.csv) file.
How can I do this?

from scrapy.spiders import CSVFeedSpider


class CsSpiderSpider(CSVFeedSpider):
    name = 'cs_spider'
    allowed_domains = ['ocw.mit.edu/courses/electrical-engineering-and-computer-science/']
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science//feed.csv']
    # headers = ['id', 'name', 'description', 'image_link']
    # delimiter = '\t'

    # Do any adaptations you need here
    #def adapt_response(self, response):
    #    return response

    def parse_row(self, response, row):
        i = {}
        #i['url'] = row['url']
        #i['name'] = row['name']
        #i['description'] = row['description']
        return i

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

孤者何惧 2025-01-16 04:14:06

以下是使用 scrapy 导出的 FEEDS 的示例。

import scrapy
from scrapy.crawler import CrawlerProcess


class CsspiderSpider(scrapy.Spider):
    name = 'cs_spider' 
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url, callback = self.parse_row
            )

    def parse_row(self, response):
        yield {
            'test':response.text
        }

process = CrawlerProcess(
    settings = {
        'FEEDS':{
            'data.csv':{
                'format':'csv'
            }
        }
    }
)
process.crawl(CsspiderSpider)
process.start()

将文件的输出保存为 .csv 格式。此外,要指定要导出的列及其顺序,请使用FEED_EXPORT_FIELDS。您可以在 中阅读有关此内容的更多信息docs

在命令行中,您可以运行:

scrapy crawl cs_spider -o output.csv

但是,在命令行中运行上述内容时,请确保注释掉 process 及以下的所有代码。

Here's an example of using the FEEDS export from scrapy.

import scrapy
from scrapy.crawler import CrawlerProcess


class CsspiderSpider(scrapy.Spider):
    name = 'cs_spider' 
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url, callback = self.parse_row
            )

    def parse_row(self, response):
        yield {
            'test':response.text
        }

process = CrawlerProcess(
    settings = {
        'FEEDS':{
            'data.csv':{
                'format':'csv'
            }
        }
    }
)
process.crawl(CsspiderSpider)
process.start()

Will save the output of your file into .csv format. Furthermore, To specify columns to export and their order use FEED_EXPORT_FIELDS. You can read more about this in the docs

In the command line you can run:

scrapy crawl cs_spider -o output.csv

However, when running the above in the command line make sure to comment out all the code from process and below.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文