Python:从CSV/Excel列中读取URL
我的Excel文件的最后一列中充满了URL链接。我想阅读这些URL的文本,以便可以在文本中搜索关键词。问题在于请求。get无法读取URL的列。你能帮我吗?谢谢你!!!
我当前的代码在这里:
import pandas as pd
data=pd.read_excel('/Users/LE/Downloads/url.xlsx')
url=data.URL
res=requests.get(url, headers=headers)
html=res.text
soup = BeautifulSoup(html, 'lxml')
它无法使用,因为“ URL”是一列。
The last column of my Excel file is filled with url links. I want to read text from these urls, so that I can search key words in the text. The problem is that requests.get cannot read a column of urls. Can you help me on this? Thank you!!!
My current code is here:
import pandas as pd
data=pd.read_excel('/Users/LE/Downloads/url.xlsx')
url=data.URL
res=requests.get(url, headers=headers)
html=res.text
soup = BeautifulSoup(html, 'lxml')
It cannot work because 'url' is a column.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您在打开文件并使用URL提取列方面做得很好,
最后一步是循环浏览它们 - 重复URL中每个URL的请求 -
You did great with opening the file and extracting the column with the urls,
one last step is to loop through them - repeat the request for each url in the urls -
正如您注意到的那样,此行将为您提供整个列:
但是,您可以在列上迭代并单独访问每个URL,就像这样:
As you noticed, this line will give you the entire column:
However, you can iterate over the column and access each URL individually, like so:
该行将数据框的URL列分配给'url':
'url'现在是PANDAS系列对象,并且可以使用for loop迭代:
请参阅“ pandas coputage” on series on系列上的信息: https://pandas.pydata.org/docs/reference/reference/reference/series/series.html
//pandas.pydata.org/docs/reference/series.html 保存位于本地URL上的文本文件的内容,然后搜索这些保存的文件,以避免执行多个请求对同一文件。
This line assigns the URL column of the Dataframe to 'url':
'url' is now a Pandas Series object and can be iterated through with a for loop:
See the Pandas documentation on Series for more info: https://pandas.pydata.org/docs/reference/series.html
Note it might be easier to save the content of the text files located at the URLs locally and then afterwards search those saved files in order to avoid executing multiple requests for the same files.