如何根据 python 中多个其他数据帧的条件替换文本?
对于一个冗长的问题,我提前表示歉意。我是 python 新手,不太确定从哪里开始解决这个问题。
我在 df 中有一个列“car_number”,有时具有未知的值。当存在未知时:
- 使用该行中的“交付”日期,并将其与 df2 中同一“car_part”编号的最旧“交付”日期进行检查
- 如果 df 和 df2 向前或向后的日期在 3 个月内匹配
- 转到 df3 并使用相同的 3 个月条件向前或向后匹配“交货”日期
- 如果匹配,则取最小的 car_number 并将其替换为 UNKNOWN
- 转到 df3 并使用相同的 3 个月条件向前或向后匹配“交货”日期
UNKNOWN 后,在 df 中创建一个名为“公式”的新列,并应用公式来检查日期是否会返回负数。公式:
- 检查新的“car_number”是否与下一行的“car_number”匹配。
- 如果匹配,则取“交货”日期并减去“测试”日期,然后返回两个日期之间的天数。
- 如果数字为正数或空白,则移至下一个 UNKNOWN
- 如果数字为负数
- 然后返回 df3 并选择下一个 car_number 并替换
- 再次应用公式,直到返回正数
- 如果新的 car_number 与下一行的 car_number 不匹配,则不返回任何内容。
最终结果应显示第一个 UNKNOWN 被 X123-00027C 替换,第二个 UNKNOWN 被 X123-00148C 替换。再次,抱歉问了这么长的问题。
data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}
data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'],
'car_part': ['100009','100009','100009','100033','100033','100033']}
data3 = {'car_part': ['100009','100009','100009','100033','100033','100033'],
'car_number': ['X123-00027C', 'X123-00031C', 'X123-00041C', 'X123-00106C', 'X123-00143C', 'X123-00148C'],
'delivery': ['8/27/2004', '8/27/2004', '8/27/2004', '3/27/2006', '3/27/2006', '3/27/2006']}
df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)
df3 = pd.DataFrame(data3)
print(df3)
df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])
df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])
df3['delivery'] = df3['delivery'].astype('datetime64[ns]')
df3.sort_values(by = ['car_part', 'car_number', 'delivery'], ascending=[True, True, True])
I apologize in advance for a long winded question. I am new to python and not really sure where to begin with this problem.
I have a column 'car_number' in df that sometimes has UNKNOWN values. When there is an UNKNOWN:
- Use the 'delivery' date in that row and check it against the oldest 'delivery' date of the same 'car_part' number in df2
- If the dates matches within a 3 months period for df and df2 forward or backward
- Go to df3 and match 'delivery' date using same 3 months condition forward or backward
- If it matches, then take the lowest car_number and replace it with UNKNOWN
- Go to df3 and match 'delivery' date using same 3 months condition forward or backward
Once UNKNOWN is replaced, create a new column called formula in df and applies a formula to check if the date will return a negative. The formula:
- Checks if the new 'car_number' matches the 'car_number' on the next row.
- If it matches then take 'delivery' date and subtract 'test' date and returns number of days between the two dates.
- If the number is positive or blank, then moves onto the next UNKNOWN
- If the number is negative
- Then go back to df3 and pick the next car_number and replace
- Applies the formula again until a positive number is returned
- If the new car_number doesn't match the car_number on the next row, returns nothing.
The end result should show that the first UNKNOWN is replaced by X123-00027C and second UNKNOWN by X123-00148C. Again, sorry for the long question.
data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}
data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'],
'car_part': ['100009','100009','100009','100033','100033','100033']}
data3 = {'car_part': ['100009','100009','100009','100033','100033','100033'],
'car_number': ['X123-00027C', 'X123-00031C', 'X123-00041C', 'X123-00106C', 'X123-00143C', 'X123-00148C'],
'delivery': ['8/27/2004', '8/27/2004', '8/27/2004', '3/27/2006', '3/27/2006', '3/27/2006']}
df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)
df3 = pd.DataFrame(data3)
print(df3)
df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])
df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])
df3['delivery'] = df3['delivery'].astype('datetime64[ns]')
df3.sort_values(by = ['car_part', 'car_number', 'delivery'], ascending=[True, True, True])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论