Pandas Dataframe 数据清理了吗?
我正在尝试清理一些我已刮入 Excel 页面的数据,但我得到了额外的信息,我想清理一下它有人可以告诉我如何确定我需要使用 pandas 删除哪个级别?
到目前为止,我的代码
soup1 = BeautifulSoup(driver.page_source,'html.parser')
df1 = pd.read_html(str(soup1))[0]
print(df1)
提取了下面的数据。
我需要的信息以红色突出显示,其他都是我不需要的无用数据。
I'm attempting to clean up some data I've scraped into an excel page but I'm getting extra info and I'm wanting to clean it up a little can someone tell me how to determine what level I need to drop using pandas?
my code so far
soup1 = BeautifulSoup(driver.page_source,'html.parser')
df1 = pd.read_html(str(soup1))[0]
print(df1)
this pulls out the data below.
the info I need is in the red highlighted everything else is useless data I don't need.
I'm not sure if it's needed but the data is being pulled from this table.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以尝试:
df=df.loc[df['案件编号'].notna() & (df['案件编号']!='案件编号')]
You may try :
df=df.loc[df['Case Number'].notna() & (df['Case Number']!='Case Number')]
首先,您需要了解
html
tablet 标准结构是如何工作的,例如:现在,您可以使用
find_all
方法并查找与该表,但我认为最好调查 BeautifulSoup 文档并搜索在表中查找数据的正确方法。First, you need to understand how a
html
tablet standard structure works, for example:Now, you can use
find_all
method and find everything related to the table, but I think it is best to investigate the BeautifulSoup documentation and search the correct way to find the data in your table.