python scrapy 需要帮助。我想保存到 (.csv) 文件。我该怎么做？

发布于 2025-01-09 04:14:06 字数 753 浏览 1 评论 0原文

我正在使用 debian Bullseye (11.2) 我想保存到 (.csv) 文件。我该怎么做？

from scrapy.spiders import CSVFeedSpider


class CsSpiderSpider(CSVFeedSpider):
    name = 'cs_spider'
    allowed_domains = ['ocw.mit.edu/courses/electrical-engineering-and-computer-science/']
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science//feed.csv']
    # headers = ['id', 'name', 'description', 'image_link']
    # delimiter = '\t'

    # Do any adaptations you need here
    #def adapt_response(self, response):
    #    return response

    def parse_row(self, response, row):
        i = {}
        #i['url'] = row['url']
        #i['name'] = row['name']
        #i['description'] = row['description']
        return i

原文

I'm using debian Bullseye (11.2)
I want to save to a (.csv) file.
How can I do this?

from scrapy.spiders import CSVFeedSpider


class CsSpiderSpider(CSVFeedSpider):
    name = 'cs_spider'
    allowed_domains = ['ocw.mit.edu/courses/electrical-engineering-and-computer-science/']
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science//feed.csv']
    # headers = ['id', 'name', 'description', 'image_link']
    # delimiter = '\t'

    # Do any adaptations you need here
    #def adapt_response(self, response):
    #    return response

    def parse_row(self, response, row):
        i = {}
        #i['url'] = row['url']
        #i['name'] = row['name']
        #i['description'] = row['description']
        return i

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤者何惧 2025-01-16 04:14:06

以下是使用 scrapy 导出的 FEEDS 的示例。

import scrapy
from scrapy.crawler import CrawlerProcess


class CsspiderSpider(scrapy.Spider):
    name = 'cs_spider' 
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url, callback = self.parse_row
            )

    def parse_row(self, response):
        yield {
            'test':response.text
        }

process = CrawlerProcess(
    settings = {
        'FEEDS':{
            'data.csv':{
                'format':'csv'
            }
        }
    }
)
process.crawl(CsspiderSpider)
process.start()

将文件的输出保存为 .csv 格式。此外，要指定要导出的列及其顺序，请使用FEED_EXPORT_FIELDS。您可以在中阅读有关此内容的更多信息docs

在命令行中，您可以运行：

scrapy crawl cs_spider -o output.csv

但是，在命令行中运行上述内容时，请确保注释掉 process 及以下的所有代码。

Here's an example of using the FEEDS export from scrapy.

import scrapy
from scrapy.crawler import CrawlerProcess


class CsspiderSpider(scrapy.Spider):
    name = 'cs_spider' 
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url, callback = self.parse_row
            )

    def parse_row(self, response):
        yield {
            'test':response.text
        }

process = CrawlerProcess(
    settings = {
        'FEEDS':{
            'data.csv':{
                'format':'csv'
            }
        }
    }
)
process.crawl(CsspiderSpider)
process.start()

Will save the output of your file into .csv format. Furthermore, To specify columns to export and their order use FEED_EXPORT_FIELDS. You can read more about this in the docs

In the command line you can run: