Python：查找一列中的最新日期，而另一列中没有匹配的日期

发布于 2025-01-19 06:39:33 字数 785 浏览 2 评论 0原文

我有两个日期列，代表了客户设施的条目和退出。

ID	ENTRY_DATE	EXIT_DATE	ORICANT_ENTRYDATE
003246	2022-03-22	NAN	2012-10-01
003246	2015-07-24	2022-03-22	2012-10-10-01
003246			2012-10-01
003246	2001-02-02	2010-04-05	2001-02-02

对于桌子中的所有ID实例输入日期代表该ID在设施之间移动但不离开护理的时间跨度的开始，并将其返回，并将其返回原始_entrydate。

在示例中，前三行的原始_entrydate值是2012-10-01，因为该entry_date与exit_date不匹配，这表明与护理的分离，日期显示了这持续了两年零几个月。如果还有该ID的其他记录，则该过程将重置并找到与该分离之前的任何记录，直到下一个分隔的任何记录。

原文

I have two date columns representing Entries and Exits from facilities for clients.

ID	entry_date	exit_date	original_entrydate
003246	2022-03-22	NaN	2012-10-01
003246	2015-07-24	2022-03-22	2012-10-01
003246	2012-10-01	2015-07-24	2012-10-01
003246	2001-02-02	2010-04-05	2001-02-02

For all instances of an ID in the table, I need to match entry_date to exit_date to find the most recent entry date that represents the beginning of an uninterrupted span of time in which that ID was moving between facilities but not leaving care, and return it in a column, original_entrydate.

In the example, the value for original_entrydate for the first three rows would be 2012-10-01, because that entry_date does not match an exit_date, indicating a separation from care, which the dates show lasted for two years and some months. If there were additional records for that ID, that process would reset and find the original_entrydate for any records preceding that separation from care, up to the next separation.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

§对你不离不弃 2025-01-26 06:39:33

I solved my problem in the most clunky way imaginable--by creating nested if-else statements:

res_phys_levels['ORIGINAL_AdmDt']=''

for i in range(0, len(res_phys_levels)):
    start_ID = res_phys_levels.iloc[i]['Individual_ID']
    aDate = res_phys_levels.iloc[i]['Admit_Date']
    id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
    if id_count == 1:     #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
    else:                 #if there's more than one instance of Individual_ID, then--
        j = i+1        
        next_ID = res_phys_levels.iloc[j]['Individual_ID']
        if start_ID != next_ID:
            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
        else: 
            sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
            if aDate != sDate: 
                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
            else: 
                aDate = res_phys_levels.iloc[j]['Admit_Date']
                k = j+1
                next_ID = res_phys_levels.iloc[k]['Individual_ID']
                if start_ID != next_ID:
                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                else: 
                    sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
                    if aDate != sDate: 
                        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                    else: 
                        aDate = res_phys_levels.iloc[k]['Admit_Date']
                        m = k+1
                        next_ID = res_phys_levels.iloc[m]['Individual_ID']
                        if start_ID != next_ID:
                            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                        else:
                            sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
                            if aDate != sDate:
                                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                            else:
                                aDate = res_phys_levels.iloc[k]['Admit_Date']
                                n = m+1
                                next_ID = res_phys_levels.iloc[n]['Individual_ID']
                                if start_ID != next_ID:
                                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate

This worked for two reasons: the dataframe was sorted by 'Individual_ID' ascending and 'Admit_Date' descending--and the nested if-else statements allowed为了比较索引[i]的“私有_id”与随后的行进行比较，直到所有可能性都用尽。

我也知道每个ID最多有4行。

但是 - 请向我展示一种更好，更多的Pythonic方法！

I solved my problem in the most clunky way imaginable--by creating nested if-else statements:

res_phys_levels['ORIGINAL_AdmDt']=''

for i in range(0, len(res_phys_levels)):
    start_ID = res_phys_levels.iloc[i]['Individual_ID']
    aDate = res_phys_levels.iloc[i]['Admit_Date']
    id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
    if id_count == 1:     #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
    else:                 #if there's more than one instance of Individual_ID, then--
        j = i+1        
        next_ID = res_phys_levels.iloc[j]['Individual_ID']
        if start_ID != next_ID:
            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
        else: 
            sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
            if aDate != sDate: 
                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
            else: 
                aDate = res_phys_levels.iloc[j]['Admit_Date']
                k = j+1
                next_ID = res_phys_levels.iloc[k]['Individual_ID']
                if start_ID != next_ID:
                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                else: 
                    sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
                    if aDate != sDate: 
                        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                    else: 
                        aDate = res_phys_levels.iloc[k]['Admit_Date']
                        m = k+1
                        next_ID = res_phys_levels.iloc[m]['Individual_ID']
                        if start_ID != next_ID:
                            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                        else:
                            sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
                            if aDate != sDate:
                                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                            else:
                                aDate = res_phys_levels.iloc[k]['Admit_Date']
                                n = m+1
                                next_ID = res_phys_levels.iloc[n]['Individual_ID']
                                if start_ID != next_ID:
                                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate

This worked for two reasons: the dataframe was sorted by 'Individual_ID' ascending and 'Admit_Date' descending--and the nested if-else statements allowed for comparison of the 'Individual_ID' at index[i] with subsequent rows, until all possibilities were exhausted.

I also knew there was a maximum of 4 rows per ID.

BUT--please show me a better, more pythonic way of doing this!

回复收藏 0 原文

~没有更多了~