Pandas Drop Rebipates用时间戳子集
我试图按子集删除重复项,但无论我做什么,结果总是相同的 - 没有任何变化。帮助我了解我做错了什么。代码:
import pandas as pd
test_df = pd.DataFrame(
{
'city': ['Cincinnati', 'San Francisco', 'Chicago', 'Chicago', 'Chicago', 'Chigaco'],
'timestamp': ['2014-03-01 00:01:00', '2014-05-01 09:11:00', '2014-01-01 15:22:00', '2014-01-01 15:59:00', '2014-01-01 23:01:00', '2014-01-01 23:01:00']
}
)
test_df = test_df.astype({'timestamp':'datetime64[ns]'})
test_df = test_df.sort_values(by=['city', 'timestamp'], ascending=False)
test_df = test_df.drop_duplicates(subset=['city', 'timestamp'], keep="first")
print(test_df)
# What I get:
# city timestamp
# 1 San Francisco 2014-05-01 09:11:00
# 0 Cincinnati 2014-03-01 00:01:00
# 5 Chigaco 2014-01-01 23:01:00
# 4 Chicago 2014-01-01 23:01:00
# 3 Chicago 2014-01-01 15:59:00
# 2 Chicago 2014-01-01 15:22:00
# Expected result:
# city timestamp
# 1 San Francisco 2014-05-01 09:11:00
# 0 Cincinnati 2014-03-01 00:01:00
# 5 Chigaco 2014-01-01 23:01:00
# 3 Chicago 2014-01-01 15:59:00
# 2 Chicago 2014-01-01 15:22:00
I am trying to drop duplicates by subset but no matter what I do it, the result is always the same - nothing changes. Help me understand what I do wrong. Code:
import pandas as pd
test_df = pd.DataFrame(
{
'city': ['Cincinnati', 'San Francisco', 'Chicago', 'Chicago', 'Chicago', 'Chigaco'],
'timestamp': ['2014-03-01 00:01:00', '2014-05-01 09:11:00', '2014-01-01 15:22:00', '2014-01-01 15:59:00', '2014-01-01 23:01:00', '2014-01-01 23:01:00']
}
)
test_df = test_df.astype({'timestamp':'datetime64[ns]'})
test_df = test_df.sort_values(by=['city', 'timestamp'], ascending=False)
test_df = test_df.drop_duplicates(subset=['city', 'timestamp'], keep="first")
print(test_df)
# What I get:
# city timestamp
# 1 San Francisco 2014-05-01 09:11:00
# 0 Cincinnati 2014-03-01 00:01:00
# 5 Chigaco 2014-01-01 23:01:00
# 4 Chicago 2014-01-01 23:01:00
# 3 Chicago 2014-01-01 15:59:00
# 2 Chicago 2014-01-01 15:22:00
# Expected result:
# city timestamp
# 1 San Francisco 2014-05-01 09:11:00
# 0 Cincinnati 2014-03-01 00:01:00
# 5 Chigaco 2014-01-01 23:01:00
# 3 Chicago 2014-01-01 15:59:00
# 2 Chicago 2014-01-01 15:22:00
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这将起作用以及其他一些答案:
This will work as well as some of the other answer:
您在芝加哥和chigaco的数据中犯了一个错误,
这就是结果
You made a mistake in your data with chicago and chigaco
here is the result