在Pandas Groupby中查找重叠的开始日期和结束日期
我正在尝试找到重叠何时开始以及何时以以下DF结束。
我能够使用以下代码来确定重叠群集,现在我想找出重叠何时开始,何时结束时,
d = [
{'G1': 'A', 'G2': 'A1','Start_Date': '6/1/2020', 'End_Date': '5/31/2022'},
{'G1': 'B', 'G2': 'A1','Start_Date': '12/1/2020', 'End_Date': '11/30/2021'},
{'G1': 'B', 'G2': 'B1','Start_Date': '6/1/2020', 'End_Date': '5/31/2021'},
{'G1': 'Y', 'G2': 'B1','Start_Date': '6/1/2021', 'End_Date': '6/1/2022'},
{'G1': 'C', 'G2': 'C1','Start_Date': '1/1/2020', 'End_Date': '3/31/2020'},
{'G1': 'C', 'G2': 'C1','Start_Date': '4/1/2020', 'End_Date': '5/31/2020'},
{'G1': 'C', 'G2': 'C2','Start_Date': '6/1/2020', 'End_Date': '7/31/2020'},
{'G1': 'I', 'G2': 'C3','Start_Date': '8/1/2020', 'End_Date': '10/31/2020'},
{'G1': 'O', 'G2': 'C3','Start_Date': '11/1/2020', 'End_Date': '12/31/2021'},
{'G1': 'D', 'G2': 'D1','Start_Date': '1/1/2020', 'End_Date': '2/28/2020'},
{'G1': 'R', 'G2': 'D2','Start_Date': '3/1/2020', 'End_Date': '3/31/2020'},
{'G1': 'F', 'G2': 'D4','Start_Date': '4/1/2020', 'End_Date': '8/31/2020'},
{'G1': 'Y', 'G2': 'D4','Start_Date': '8/1/2020', 'End_Date': '10/31/2020'},
{'G1': 'D', 'G2': 'D4','Start_Date': '11/1/2020', 'End_Date': '12/31/2021'},
]
df = pd.DataFrame(d)
df['Start_Date'] = pd.to_datetime(df['Start_Date'],format='%m/%d/%Y', errors='coerce')
df['End_Date'] = pd.to_datetime(df['End_Date'],format='%m/%d/%Y', errors='coerce')
df['Range'] = df.apply(lambda x: pd.date_range(start=x['Start_Date'], end=x['End_Date']), axis=1)
def determine_cluster(group):
bucket = pd.DatetimeIndex(['1/1/1900'])
for x in group:
if x.isin(bucket).any():
return [True]*len(group)
bucket = bucket.append(x)
return [False]*len(group)
df['Cluster'] = df.groupby(['G2'])['Range'].transform(determine_cluster)
df.drop('Range', axis=1)
它给出以下结果:
但是,我所需的输出是:
I am trying to find when the overlap start and when it ended in following DF.
I am able to determine the overlap cluster using below code, and now I want to find out when the overlap begins and when it ends
d = [
{'G1': 'A', 'G2': 'A1','Start_Date': '6/1/2020', 'End_Date': '5/31/2022'},
{'G1': 'B', 'G2': 'A1','Start_Date': '12/1/2020', 'End_Date': '11/30/2021'},
{'G1': 'B', 'G2': 'B1','Start_Date': '6/1/2020', 'End_Date': '5/31/2021'},
{'G1': 'Y', 'G2': 'B1','Start_Date': '6/1/2021', 'End_Date': '6/1/2022'},
{'G1': 'C', 'G2': 'C1','Start_Date': '1/1/2020', 'End_Date': '3/31/2020'},
{'G1': 'C', 'G2': 'C1','Start_Date': '4/1/2020', 'End_Date': '5/31/2020'},
{'G1': 'C', 'G2': 'C2','Start_Date': '6/1/2020', 'End_Date': '7/31/2020'},
{'G1': 'I', 'G2': 'C3','Start_Date': '8/1/2020', 'End_Date': '10/31/2020'},
{'G1': 'O', 'G2': 'C3','Start_Date': '11/1/2020', 'End_Date': '12/31/2021'},
{'G1': 'D', 'G2': 'D1','Start_Date': '1/1/2020', 'End_Date': '2/28/2020'},
{'G1': 'R', 'G2': 'D2','Start_Date': '3/1/2020', 'End_Date': '3/31/2020'},
{'G1': 'F', 'G2': 'D4','Start_Date': '4/1/2020', 'End_Date': '8/31/2020'},
{'G1': 'Y', 'G2': 'D4','Start_Date': '8/1/2020', 'End_Date': '10/31/2020'},
{'G1': 'D', 'G2': 'D4','Start_Date': '11/1/2020', 'End_Date': '12/31/2021'},
]
df = pd.DataFrame(d)
df['Start_Date'] = pd.to_datetime(df['Start_Date'],format='%m/%d/%Y', errors='coerce')
df['End_Date'] = pd.to_datetime(df['End_Date'],format='%m/%d/%Y', errors='coerce')
df['Range'] = df.apply(lambda x: pd.date_range(start=x['Start_Date'], end=x['End_Date']), axis=1)
def determine_cluster(group):
bucket = pd.DatetimeIndex(['1/1/1900'])
for x in group:
if x.isin(bucket).any():
return [True]*len(group)
bucket = bucket.append(x)
return [False]*len(group)
df['Cluster'] = df.groupby(['G2'])['Range'].transform(determine_cluster)
df.drop('Range', axis=1)
It Give below result:
However my Desired output is:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试:
打印:
Try:
Prints:
您可以使用:
输出:
You can use:
Output: