尝试在 Scrapy 中使用 ItemExporter

发布于 2024-11-17 00:18:50 字数 1506 浏览 2 评论 0原文

我正在尝试在我的代码中实现某种项目导出器。我的基本代码现在是抓取 si.com 的击球率,仅作为示例。结果显示在一长行中,我想修改存储在 .csv 文件中的输出,将其放在一列中。下面我包括了蜘蛛,我正在使用的项目导出器只是找到的基本导出器 此处。 我真正想要发生的是将每个项目并将结果存储在彼此相邻的列中,而不是连续存储所有三个结果的一长行。

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import XmlItemExporter

from mlb1.items import MlbItem

class MLBSpider(BaseSpider):
   name = "si.com"
   allowed_domains = ["si.com"]
   start_urls = [
       http://sportsillustrated.cnn.com/baseball/mlb/stats/2011/batting/ml_0_byBATTING_AVG.html"
       ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//div[@class="cnnSASD_sport-mlb"]/div[@class="cnnSASD_page-leadersPlayersExpandedStats"]/div[@class="cnnStatsContent"]')
       items = []
       for site in sites:
           item = MlbItem()
           item['name'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol1"]//text()').extract()
           item['team'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol2"]//text()').extract()
           item['batave'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnColHighlight"]//text()').extract()
           items.append(item)
       return items

我对 Python 编码还很陌生,所以 scrapy 文档没有太大帮助。当我尝试运行代码时,出现错误“ImportError:加载对象‘mlb1.pipelines.XmlExportPipeline’时出错:无法导入名称信号”。任何人可以提供的任何帮助将不胜感激。

I'm trying to implement some sort of Item Exporter in my code. My basic code is right now to scrape si.com for batting averages, just as an example. The results are presented in one long row, and I'd like to modify the output as it's stored in the .csv file to put it in a column instead. Below I'm including the spider, and the item exporter I'm using is just the basic one found here. What I really want to have happen is take each item and store the results in columns next to each other instead of one long row with all three results consecutively.

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import XmlItemExporter

from mlb1.items import MlbItem

class MLBSpider(BaseSpider):
   name = "si.com"
   allowed_domains = ["si.com"]
   start_urls = [
       http://sportsillustrated.cnn.com/baseball/mlb/stats/2011/batting/ml_0_byBATTING_AVG.html"
       ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//div[@class="cnnSASD_sport-mlb"]/div[@class="cnnSASD_page-leadersPlayersExpandedStats"]/div[@class="cnnStatsContent"]')
       items = []
       for site in sites:
           item = MlbItem()
           item['name'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol1"]//text()').extract()
           item['team'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol2"]//text()').extract()
           item['batave'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnColHighlight"]//text()').extract()
           items.append(item)
       return items

I'm still very new at Python Coding so the scrapy documentation isn't much help. When I try running the code, I get an error of, "ImportError: Error loading object 'mlb1.pipelines.XmlExportPipeline': cannot import name signals". Any help anyone can provide would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

薔薇婲 2024-11-24 00:18:50

请参阅此示例以提取玩家名称

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    player_names = hxs.select('//table[@class="cnnSASD_first"]//td[@class="cnnCol1"]/a')
    for p_name in player_names:
        l = XPathItemLoader(item=MlbItem(), selector=p_name )
        l.add_xpath('name', 'text()')
        yield l.load_item()

在 scrapy 命令行中,使用 --set FEED_URI=items.csv --set FEED_FORMAT=csv 。这会将您的姓名转储到 items.csv 文件中。无需编写您的 feed 导出器。您可以在相似的行上为团队名称建模 xpath

See this example for extracting player names

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    player_names = hxs.select('//table[@class="cnnSASD_first"]//td[@class="cnnCol1"]/a')
    for p_name in player_names:
        l = XPathItemLoader(item=MlbItem(), selector=p_name )
        l.add_xpath('name', 'text()')
        yield l.load_item()

In scrapy command line, use --set FEED_URI=items.csv --set FEED_FORMAT=csv . This will dump your names to items.csv file. No need to write your feed exporter. You can model your xpath for team names on similar lines

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文