尝试在 Scrapy 中使用 ItemExporter
我正在尝试在我的代码中实现某种项目导出器。我的基本代码现在是抓取 si.com 的击球率,仅作为示例。结果显示在一长行中,我想修改存储在 .csv 文件中的输出,将其放在一列中。下面我包括了蜘蛛,我正在使用的项目导出器只是找到的基本导出器 此处。 我真正想要发生的是将每个项目并将结果存储在彼此相邻的列中,而不是连续存储所有三个结果的一长行。
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import XmlItemExporter
from mlb1.items import MlbItem
class MLBSpider(BaseSpider):
name = "si.com"
allowed_domains = ["si.com"]
start_urls = [
http://sportsillustrated.cnn.com/baseball/mlb/stats/2011/batting/ml_0_byBATTING_AVG.html"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="cnnSASD_sport-mlb"]/div[@class="cnnSASD_page-leadersPlayersExpandedStats"]/div[@class="cnnStatsContent"]')
items = []
for site in sites:
item = MlbItem()
item['name'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol1"]//text()').extract()
item['team'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol2"]//text()').extract()
item['batave'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnColHighlight"]//text()').extract()
items.append(item)
return items
我对 Python 编码还很陌生,所以 scrapy 文档没有太大帮助。当我尝试运行代码时,出现错误“ImportError:加载对象‘mlb1.pipelines.XmlExportPipeline’时出错:无法导入名称信号”。任何人可以提供的任何帮助将不胜感激。
I'm trying to implement some sort of Item Exporter in my code. My basic code is right now to scrape si.com for batting averages, just as an example. The results are presented in one long row, and I'd like to modify the output as it's stored in the .csv file to put it in a column instead. Below I'm including the spider, and the item exporter I'm using is just the basic one found here. What I really want to have happen is take each item and store the results in columns next to each other instead of one long row with all three results consecutively.
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import XmlItemExporter
from mlb1.items import MlbItem
class MLBSpider(BaseSpider):
name = "si.com"
allowed_domains = ["si.com"]
start_urls = [
http://sportsillustrated.cnn.com/baseball/mlb/stats/2011/batting/ml_0_byBATTING_AVG.html"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="cnnSASD_sport-mlb"]/div[@class="cnnSASD_page-leadersPlayersExpandedStats"]/div[@class="cnnStatsContent"]')
items = []
for site in sites:
item = MlbItem()
item['name'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol1"]//text()').extract()
item['team'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol2"]//text()').extract()
item['batave'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnColHighlight"]//text()').extract()
items.append(item)
return items
I'm still very new at Python Coding so the scrapy documentation isn't much help. When I try running the code, I get an error of, "ImportError: Error loading object 'mlb1.pipelines.XmlExportPipeline': cannot import name signals". Any help anyone can provide would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请参阅此示例以提取玩家名称
在 scrapy 命令行中,使用
--set FEED_URI=items.csv --set FEED_FORMAT=csv
。这会将您的姓名转储到items.csv
文件中。无需编写您的 feed 导出器。您可以在相似的行上为团队名称建模 xpathSee this example for extracting player names
In scrapy command line, use
--set FEED_URI=items.csv --set FEED_FORMAT=csv
. This will dump your names toitems.csv
file. No need to write your feed exporter. You can model your xpath for team names on similar lines