如何修复” TypeError:列表索引必须是整数或切片,而不是str。 &quot?
我正在尝试刮擦网站。我希望能够从此网页检索一个URL链接,并使用它到达另一个页面,在这里我可以访问所需的此信息。
import requests
from bs4 import BeautifulSoup
headers = {'User-agent': 'Mozilla/5.0 (Windows 10; Win64; x64; rv:101.0.1) Gecko/20100101 Firefox/101.0.1'}
baseUrl = 'https://elitejobstoday.com/'
url = "https://elitejobstoday.com/"
r = requests.get(url, headers = headers)
c = r.content
soup = BeautifulSoup(c, "lxml")
table = soup.find_all("a", attrs = {"class": "job-details-link"})
该部分正常工作,但是下一部分是我被卡住的地方。
def jobScan(link):
the_job = {}
jobUrl = '{}{}'.format(baseUrl, link['href'])
the_job['urlLink'] = jobUrl
job = requests.get(jobUrl, headers = headers )
jobC = job.content
jobSoup = BeautifulSoup(jobC, "lxml")
name = jobSoup.find("h3", attrs={"class": "loop-item-title"})
title = name.a.text
the_job['title'] = title
company = jobSoup.find_all("span", {"class": "job-company"})[0]
company = company.text
the_job['company'] = company
print(the_job)
return the_job
jobScan(table)
我遇到了这个错误:
"File "C:\Users\MUHUMUZA IVAN\Desktop\JobPortal\absa.py", line 41, in jobScan
jobUrl = '{}{}'.format(baseUrl, link['href'])
TypeError: list indices must be integers or slices, not str "
我显然做错了什么,但看不到。我需要你的帮助。谢谢。
I'm trying to scrape a website. I want to be able to retrieve a URL link from this webpage and use it to get to another page wherein I can access this information that I need.
import requests
from bs4 import BeautifulSoup
headers = {'User-agent': 'Mozilla/5.0 (Windows 10; Win64; x64; rv:101.0.1) Gecko/20100101 Firefox/101.0.1'}
baseUrl = 'https://elitejobstoday.com/'
url = "https://elitejobstoday.com/"
r = requests.get(url, headers = headers)
c = r.content
soup = BeautifulSoup(c, "lxml")
table = soup.find_all("a", attrs = {"class": "job-details-link"})
This part works fine however the next part is where I get stuck.
def jobScan(link):
the_job = {}
jobUrl = '{}{}'.format(baseUrl, link['href'])
the_job['urlLink'] = jobUrl
job = requests.get(jobUrl, headers = headers )
jobC = job.content
jobSoup = BeautifulSoup(jobC, "lxml")
name = jobSoup.find("h3", attrs={"class": "loop-item-title"})
title = name.a.text
the_job['title'] = title
company = jobSoup.find_all("span", {"class": "job-company"})[0]
company = company.text
the_job['company'] = company
print(the_job)
return the_job
jobScan(table)
I'm getting this error:
"File "C:\Users\MUHUMUZA IVAN\Desktop\JobPortal\absa.py", line 41, in jobScan
jobUrl = '{}{}'.format(baseUrl, link['href'])
TypeError: list indices must be integers or slices, not str "
I'm clearly doing something wrong but i can't see it. I need your help. thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有两个主要问题:
您没有迭代URL的
resultset
,您将table
作为URL列表推送到您的函数。您的URL变得无效,在准备
baseurl
时,只需尝试使用joburl = link ['href']
原因路径是绝对的。注意 您还应该检查响应中您要寻找的元素
示例
是否在前两个URL中进行迭代 - 第三个会给您带来错误,因为没有<代码>&lt; h3&gt; 在响应中,应在新的问题中以此重点提出:
输出
There are two main issues:
You are not iterating the
ResultSet
of urls, you pushtable
as list of urls to your function.Your urls become invalid, while prepending
baseUrl
, just try to usejobUrl = link['href']
cause path is absolute.Note You also should check if the elements you are looking for exists in the responses
Example
Iterates over the first two urls - Third will give you an error, cause there is no
<h3>
in response, but this should be asked in new question with exact this focus:Output