为什么我会收到“IndexError:列表索引超出范围”? (美丽的汤)
我正在尝试此处抓取结构非常相似的表格到 我以前的问题。我刚刚更改了属性名称,但收到 index out of range
错误。这是 TR:
<TR VALIGN="bottom">
<TD BGCOLOR=#cc6600 ALIGN="center" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">1</FONT></TD>
<TD BGCOLOR=#CC6600 ALIGN="left" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">Wachtell, Lipton</FONT></TD>
<TD BGCOLOR=#CC6600 ALIGN="center" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">1 </FONT></TD>
<TD BGCOLOR=#CC6600 ALIGN="center" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">9.1%</FONT></TD>
<TD BGCOLOR=#FF9933 ALIGN="center" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">$3,385,000 </FONT></TD>
</TR>
我正在尝试获取第一个 ALIGN="left"
和最后一个 ALIGN="center"
。但最后一行的索引给出了错误。这是我正在使用的代码:
soup = BeautifulSoup(urllib.urlopen("http://www.law.com/special/professionals/amlaw/amlaw200/amlaw200_ppp.html"))
rows = soup.findAll(name='tr',attrs={'valign':'bottom'}, limit=13)
for row in rows:
tds_left = row.findAll(name='td',attrs={'align':'left'}, limit=13)
tds_center = row.findAll(name='td',attrs={'align':'center'}, limit=13)
if tds_left:
firm_name = tds_left[0].text
if tds_center:
# the following line gives an error if the index is different than 0
ppp = tds_center[0].text
谢谢!
更新
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\webapp\_webapp25.py", line 701, in __call__
handler.get(*groups)
File "C:\U\A\D\\toplawfirms.py", line 384, in get
ppp = tds_center[2].text
IndexError: list index out of range
更新
作为对 agf
评论的回应,这里有 print tds_center
和 for item in tds_center:打印项目
?
tds_center: []
tds_center: []
tds_center: []
tds_center: [ ]
item:
tds_center: []
item:
tds_center: [Rank By
Profits Per
Partner, Rank By
Revenue
Per Lawyer, Change In
Profits per
Partner
from 1998, Profits Per
Partner]
item: Rank By
Profits Per
Partner
item: Rank By
Revenue
Per Lawyer
item: Change In
Profits per
Partner
from 1998
item: Profits Per
Partner
tds_center: [1, 1 , 9.1%, $3,385,000 ]
item: 1
item: 1
item: 9.1%
item: $3,385,000
tds_center: [2, 2 , 5.0%, $3,055,000 ]
item: 2
item: 2
item: 5.0%
item: $3,055,000
tds_center: [3, 4 , 2.9%, $2,110,000 ]
item: 3
item: 4
item: 2.9%
item: $2,110,000
tds_center: [4, 3 , 8.7%, $1,790,000 ]
item: 4
item: 3
item: 8.7%
item: $1,790,000
tds_center: [5, 9 , 6.9%, $1,710,000 ]
item: 5
item: 9
item: 6.9%
item: $1,710,000
tds_center: [6, 6 , 10.8%, $1,655,000 ]
item: 6
item: 6
item: 10.8%
item: $1,655,000
tds_center: [7, 5 , 5.1%, $1,610,000 ]
item: 7
item: 5
item: 5.1%
item: $1,610,000
I am trying to scrape a table here very similar in structure to my previous question. I just changed the attributes names but I am getting index out of range
error. This is the TR:
<TR VALIGN="bottom">
<TD BGCOLOR=#cc6600 ALIGN="center" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">1</FONT></TD>
<TD BGCOLOR=#CC6600 ALIGN="left" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">Wachtell, Lipton</FONT></TD>
<TD BGCOLOR=#CC6600 ALIGN="center" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">1 </FONT></TD>
<TD BGCOLOR=#CC6600 ALIGN="center" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">9.1%</FONT></TD>
<TD BGCOLOR=#FF9933 ALIGN="center" ><FONT FACE="Verdana, Arial, Helvetica, sans-serif">$3,385,000 </FONT></TD>
</TR>
I am trying to get the first ALIGN="left"
and the last ALIGN="center"
. But the index for the last line gives the error. Here is the code I am using:
soup = BeautifulSoup(urllib.urlopen("http://www.law.com/special/professionals/amlaw/amlaw200/amlaw200_ppp.html"))
rows = soup.findAll(name='tr',attrs={'valign':'bottom'}, limit=13)
for row in rows:
tds_left = row.findAll(name='td',attrs={'align':'left'}, limit=13)
tds_center = row.findAll(name='td',attrs={'align':'center'}, limit=13)
if tds_left:
firm_name = tds_left[0].text
if tds_center:
# the following line gives an error if the index is different than 0
ppp = tds_center[0].text
Thanks!
Update
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\webapp\_webapp25.py", line 701, in __call__
handler.get(*groups)
File "C:\U\A\D\\toplawfirms.py", line 384, in get
ppp = tds_center[2].text
IndexError: list index out of range
Update
As response to agf
's comment here are print tds_center
and for item in tds_center: print item
?
tds_center: []
tds_center: []
tds_center: []
tds_center: [ ]
item:
tds_center: []
item:
tds_center: [Rank By
Profits Per
Partner, Rank By
Revenue
Per Lawyer, Change In
Profits per
Partner
from 1998, Profits Per
Partner]
item: Rank By
Profits Per
Partner
item: Rank By
Revenue
Per Lawyer
item: Change In
Profits per
Partner
from 1998
item: Profits Per
Partner
tds_center: [1, 1 , 9.1%, $3,385,000 ]
item: 1
item: 1
item: 9.1%
item: $3,385,000
tds_center: [2, 2 , 5.0%, $3,055,000 ]
item: 2
item: 2
item: 5.0%
item: $3,055,000
tds_center: [3, 4 , 2.9%, $2,110,000 ]
item: 3
item: 4
item: 2.9%
item: $2,110,000
tds_center: [4, 3 , 8.7%, $1,790,000 ]
item: 4
item: 3
item: 8.7%
item: $1,790,000
tds_center: [5, 9 , 6.9%, $1,710,000 ]
item: 5
item: 9
item: 6.9%
item: $1,710,000
tds_center: [6, 6 , 10.8%, $1,655,000 ]
item: 6
item: 6
item: 10.8%
item: $1,655,000
tds_center: [7, 5 , 5.1%, $1,610,000 ]
item: 7
item: 5
item: 5.1%
item: $1,610,000
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我修改了在以下代码中获取最后一个“中心”td 的方式:
并得到以下结果:
I modified how you are getting the last "center" td in the following code:
and got the following result:
回溯与代码不对应。
回溯:
你的代码:
你的代码的结果输出有效,但看起来不是很有趣,John Keyes 有更有趣的输出,但用 [-1] 值代替。
这取决于你的需求。
The traceback doesn't correspond to the code.
traceback:
your code:
The result output of your code works, but doesn't seem very interesting, the John Keyes one have more interesting output, but with [-1] value instead.
It's depend on your needs.