在 Python 中运行基本 Web Scrape 时出现索引错误
我正在使用Python 2.7。当我尝试运行此代码时,当函数命中 print findPatTitle[i] 时出现问题,并且 python 返回“索引错误:列表索引超出范围”。我从 youtube 上的第 13 个 python 教程中获取了这段代码,并且我很确定代码是相同的,所以我不明白为什么我会遇到范围问题。有什么想法吗?
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re
webpage = urlopen('http://feeds.huffingtonpost.com/huffingtonpost/LatestNews').read()
patFinderTitle = re.compile('<title>(.*)<title>')
patFinderLink = re.compile('<link rel.*href="(.*)" />')
findPatTitle = re.findall(patFinderTitle,webpage)
findPatLink = re.findall(patFinderLink,webpage)
listIterator = []
listIterator[:] = range(2,16)
for i in listIterator:
print findPatTitle[i]
print findPatLink[i]
print "\n"
I'm using Python 2.7. When I try to run this code, I get a problem when the function hits print findPatTitle[i], and python returns "Index Error: list index out of range". I'm taking this code from the 13th python tutorial on youtube, and I'm pretty sure the code is identical, so I don't understand why I would get a range problem. Any ideas?
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re
webpage = urlopen('http://feeds.huffingtonpost.com/huffingtonpost/LatestNews').read()
patFinderTitle = re.compile('<title>(.*)<title>')
patFinderLink = re.compile('<link rel.*href="(.*)" />')
findPatTitle = re.findall(patFinderTitle,webpage)
findPatLink = re.findall(patFinderLink,webpage)
listIterator = []
listIterator[:] = range(2,16)
for i in listIterator:
print findPatTitle[i]
print findPatLink[i]
print "\n"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您的正则表达式设法找到标题和链接标签,您将在使用 findall 时获得匹配字符串的列表。在这种情况下,您可以迭代它们并打印它。
例如:
您收到的索引错误是因为您尝试访问从 2 到 16 的元素列表,但标题或链接中都没有 16 个元素。
请注意,
listIterator[:] = range(2,16)
并不是为此目的编写代码的好方法。你可以只使用If you regex managed to find out the title and link tags you would be getting a list of matched strings when using the findall. In that case, you can just iterate through them and print it.
Like:
The Index Error you are getting is because you are trying to access the list of elements from 2 to 16 and there are not 16 elements in either Titles or links.
Note,
listIterator[:] = range(2,16)
is not a good way to write code for this purpose. You could just use