在 Python 中运行基本 Web Scrape 时出现索引错误

发布于 2024-12-03 05:43:41 字数 726 浏览 0 评论 0原文

我正在使用Python 2.7。当我尝试运行此代码时，当函数命中 print findPatTitle[i] 时出现问题，并且 python 返回“索引错误：列表索引超出范围”。我从 youtube 上的第 13 个 python 教程中获取了这段代码，并且我很确定代码是相同的，所以我不明白为什么我会遇到范围问题。有什么想法吗？

from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re

webpage = urlopen('http://feeds.huffingtonpost.com/huffingtonpost/LatestNews').read()

patFinderTitle = re.compile('<title>(.*)<title>')

patFinderLink = re.compile('<link rel.*href="(.*)" />')

findPatTitle = re.findall(patFinderTitle,webpage)
findPatLink = re.findall(patFinderLink,webpage)

listIterator = []
listIterator[:] = range(2,16)

for i in listIterator:
    print findPatTitle[i]
    print findPatLink[i]
    print "\n"

原文

I'm using Python 2.7. When I try to run this code, I get a problem when the function hits print findPatTitle[i], and python returns "Index Error: list index out of range". I'm taking this code from the 13th python tutorial on youtube, and I'm pretty sure the code is identical, so I don't understand why I would get a range problem. Any ideas?

from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re

webpage = urlopen('http://feeds.huffingtonpost.com/huffingtonpost/LatestNews').read()

patFinderTitle = re.compile('<title>(.*)<title>')

patFinderLink = re.compile('<link rel.*href="(.*)" />')

findPatTitle = re.findall(patFinderTitle,webpage)
findPatLink = re.findall(patFinderLink,webpage)

listIterator = []
listIterator[:] = range(2,16)

for i in listIterator:
    print findPatTitle[i]
    print findPatLink[i]
    print "\n"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不爱素颜 2024-12-10 05:43:41

如果您的正则表达式设法找到标题和链接标签，您将在使用 findall 时获得匹配字符串的列表。在这种情况下，您可以迭代它们并打印它。

例如：

for title in findPatTitle:
    print title

for link in findPatLink:
    print link

您收到的索引错误是因为您尝试访问从 2 到 16 的元素列表，但标题或链接中都没有 16 个元素。

请注意，listIterator[:] = range(2,16) 并不是为此目的编写代码的好方法。你可以只使用

for i in range(2, 16)
    # use i

If you regex managed to find out the title and link tags you would be getting a list of matched strings when using the findall. In that case, you can just iterate through them and print it.

Like:

for title in findPatTitle:
    print title

for link in findPatLink:
    print link

The Index Error you are getting is because you are trying to access the list of elements from 2 to 16 and there are not 16 elements in either Titles or links.

Note, listIterator[:] = range(2,16) is not a good way to write code for this purpose. You could just use