用硒写入文档时列出索引范围之外的范围
我正在尝试将大学名称、部门名称和评级写入 https 的文件://www.whatuni.com/university-course-reviews/?pageno=14。一切进展顺利,直到我到达一个没有部门名称的帖子,它给了我错误
file.write(user_name[k].text + ";" + uni_names[k].text + ";" + department[k].text + ";" + date_posted[k].text +
IndexError: list index out of range
这是我使用的代码。我相信当部门不存在时我需要以某种方式写入 null 或使用空间。我使用 if not 和 else 但它对我不起作用。我将不胜感激任何帮助。谢谢
for i in range(20):
try:
driver.refresh()
uni_names = driver.find_elements_by_xpath('//div[@class="rlst_wrap"]/h2/a')
department_names = driver.find_elements_by_xpath('//div[@class="rlst_wrap"]/h3/a')
user_name = driver.find_elements_by_xpath('//div[@class="rev_name"]')
date_posted = driver.find_elements_by_xpath('//div[@class="rev_dte"]')
uni_rev = driver.find_elements_by_xpath('(//div[@class="reviw_rating"]/div[@class="rate_new"]/p)')
uni_rating = driver.find_elements_by_xpath('(//div[@class="reviw_rating"]/div[@class="rate_new"]/span[starts-with(@class,"ml5")])')
job_prospects = driver.find_elements_by_xpath('//span[text()="Job Prospects"]/following-sibling::span')
course_and_lecturers = driver.find_elements_by_xpath('//span[text()="Course and Lecturers"]/following-sibling::span')
if not course_and_lecturers:
lecturers= "None"
else:
lecturers = course_and_lecturers
uni_facilities = driver.find_elements_by_xpath('//span[text()= "Facilities" or "Uni Facilities"]/following-sibling::span')
if not uni_facilities:
facilities = "None"
else:
facilities = uni_facilities
student_support = driver.find_elements_by_xpath('//span[text()="Student Support"]/following-sibling::span')
if not student_support:
support = "None"
else:
support = student_support
with open('uni_scraping.csv', 'a') as file:
for k in range(len(uni_names)):
if not department_names:
department = "None"
else:
department = department_names
file.write(user_name[k].text + ";" + uni_names[k].text + ";" + department[k].text + ";" + date_posted[k].text +
";" + uni_rating[k].get_attribute("class") + ";" + job_prospects[k].get_attribute("class") +
";" + lecturers[k].get_attribute("class") + ";" + facilities[k].get_attribute("class") +
";" + support[k].get_attribute("class") + ";" + uni_rev[k].text + "\n")
next_page = driver.find_element_by_class_name('mr0')
next_page.click()
file.close()
except exceptions.StaleElementReferenceException as e:
print('e')
pass
driver.close()
I am trying to write uni names, department names and ratings to a file from https://www.whatuni.com/university-course-reviews/?pageno=14. It goes well until I reach a post without a department name it gives me the error
file.write(user_name[k].text + ";" + uni_names[k].text + ";" + department[k].text + ";" + date_posted[k].text +
IndexError: list index out of range
Here is the code I use. I believe I need to somehow write null or use space when the department doesn't exist. I use if not and else but it didn't work for me. I would appreciate any help. Thank you
for i in range(20):
try:
driver.refresh()
uni_names = driver.find_elements_by_xpath('//div[@class="rlst_wrap"]/h2/a')
department_names = driver.find_elements_by_xpath('//div[@class="rlst_wrap"]/h3/a')
user_name = driver.find_elements_by_xpath('//div[@class="rev_name"]')
date_posted = driver.find_elements_by_xpath('//div[@class="rev_dte"]')
uni_rev = driver.find_elements_by_xpath('(//div[@class="reviw_rating"]/div[@class="rate_new"]/p)')
uni_rating = driver.find_elements_by_xpath('(//div[@class="reviw_rating"]/div[@class="rate_new"]/span[starts-with(@class,"ml5")])')
job_prospects = driver.find_elements_by_xpath('//span[text()="Job Prospects"]/following-sibling::span')
course_and_lecturers = driver.find_elements_by_xpath('//span[text()="Course and Lecturers"]/following-sibling::span')
if not course_and_lecturers:
lecturers= "None"
else:
lecturers = course_and_lecturers
uni_facilities = driver.find_elements_by_xpath('//span[text()= "Facilities" or "Uni Facilities"]/following-sibling::span')
if not uni_facilities:
facilities = "None"
else:
facilities = uni_facilities
student_support = driver.find_elements_by_xpath('//span[text()="Student Support"]/following-sibling::span')
if not student_support:
support = "None"
else:
support = student_support
with open('uni_scraping.csv', 'a') as file:
for k in range(len(uni_names)):
if not department_names:
department = "None"
else:
department = department_names
file.write(user_name[k].text + ";" + uni_names[k].text + ";" + department[k].text + ";" + date_posted[k].text +
";" + uni_rating[k].get_attribute("class") + ";" + job_prospects[k].get_attribute("class") +
";" + lecturers[k].get_attribute("class") + ";" + facilities[k].get_attribute("class") +
";" + support[k].get_attribute("class") + ";" + uni_rev[k].text + "\n")
next_page = driver.find_element_by_class_name('mr0')
next_page.click()
file.close()
except exceptions.StaleElementReferenceException as e:
print('e')
pass
driver.close()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
谢谢维米森的回答。我按照你的建议做了,它对我有用。我写了这样的东西。
最好的
Thank you Vimizen for the answer. I did what you suggested and it worked for me. I wrote something like this.
Best
尝试
时,您有很好的感觉,如果不是Department_names
,则只有在列表为空时才有效。就您而言,问题是列表太短了。由于大学的大学,
Department_names
将比UNI_NAMES
更短。结果,在您的循环
中,k范围内的k(len(uni_names)):
department [k] .Text
并不总是是Uni的部门相同的索引,在某个时候,K的价值将比您的部门列表更大。这就是为什么部门[k]
会导致错误。我不知道什么是最有效的方法,但是我认为您可以通过每个Uni的完整详细信息获得更大的元素(例如,整个RLST_WRAP),然后在其中搜索Uni的详细信息(带Regexp例如)。这样,您就会知道何时没有部门,并避免问题。
You had a good feeling when you tried
if not department_names
but it only works if the list is empty. In your case, the issue is that the list is too short.Due to the universitie whithout departments,
department_names
will be a shorter list thanuni_names
.As a result, in you loop
for k in range(len(uni_names)):
thedepartment[k].text
will not always be the department of the uni with the same index, and at some point k will have a greater value than your department list. That's whydepartment[k]
will cause an error.I don't know what is most efficient way to go around this but I think that you could get larger elements with the full details of every uni (the whole rlst_wrap for example), then search in it the details for the uni (with regexp for example). That way you would know when there is no department, and avoid the issue.