如何删除维基百科表格中的副标题?
我正在尝试将维基百科表网络废弃到数据框中。在维基百科表中,我想删除人口密度、土地面积,特别是人口(排名)。最后,我想保留州或领土,只保留人口(人民)。
https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population_密度
这是我的代码:
wiki = "https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population_density"
table_class="wikitable sortable jquery-tablesorter"
response=requests.get(wiki)
soup = BeautifulSoup(response.text, 'html.parser')
indiatable=soup.find('table',{'class':"wikitable"})
df=pd.read_html(str(indiatable))
df=pd.DataFrame(df[0])
data = df.drop(["Population density","Population"["Rank"],"Land area"], axis=1)
wikidata = data.rename(columns={"State or territory": "State","Population": "Population"})
print (wikidata.head())
如何我是否特别引用该子表标题降低人口排名?
I am trying to web scrap a wikipedia table into a dataframe. In the wikipedia table, I want to drop Population density, Land Area, and specifically Population (Rank). In the end I want to keep State or territory and just Population (People).
Here is my code:
wiki = "https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population_density"
table_class="wikitable sortable jquery-tablesorter"
response=requests.get(wiki)
soup = BeautifulSoup(response.text, 'html.parser')
indiatable=soup.find('table',{'class':"wikitable"})
df=pd.read_html(str(indiatable))
df=pd.DataFrame(df[0])
data = df.drop(["Population density","Population"["Rank"],"Land area"], axis=1)
wikidata = data.rename(columns={"State or territory": "State","Population": "Population"})
print (wikidata.head())
How to do I reference specifically that subtable header to drop the rank in Population?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
注意: 您的问题没有预期结果,因此您可能需要对标题进行一些调整。假设您想将
people
重命名为population
而不是 Population 本身,我对此进行了更改。要实现您的目标,只需设置
header
> 读取 html 时的参数仅选择第二个,因此无需单独删除它:示例
输出
Note: There is no expected result in your question, so you may have to make some adjustments to your headers. Assuming you like to rename
people
topopulation
and not population by itself I changed that.To get your goal, simply set the
header
parameter while reading the html to choose only the second, so you do not need to drop it separatly:Example
Output