如何根据 python 中多个其他数据帧的条件替换文本?

发布于 2025-01-18 13:55:46 字数 2179 浏览 0 评论 0原文

对于一个冗长的问题,我提前表示歉意。我是 python 新手,不太确定从哪里开始解决这个问题。

我在 df 中有一个列“car_number”,有时具有未知的值。当存在未知时:

  • 使用该行中的“交付”日期,并将其与 df2 中同一“car_part”编号的最旧“交付”日期进行检查
  • 如果 df 和 df2 向前或向后的日期在 3 个月内匹配
    • 转到 df3 并使用相同的 3 个月条件向前或向后匹配“交货”日期
      • 如果匹配,则取最小的 car_number 并将其替换为 UNKNOWN

UNKNOWN 后,在 df 中创建一个名为“公式”的新列,并应用公式来检查日期是否会返回负数。公式:

  • 检查新的“car_number”是否与下一行的“car_number”匹配。
  • 如果匹配,则取“交货”日期并减去“测试”日期,然后返回两个日期之间的天数。
  • 如果数字为正数或空白,则移至下一个 UNKNOWN
  • 如果数字为负数
    • 然后返回 df3 并选择下一个 car_number 并替换
    • 再次应用公式,直到返回正数
  • 如果新的 car_number 与下一行的 car_number 不匹配,则不返回任何内容。

最终结果应显示第一个 UNKNOWN 被 X123-00027C 替换,第二个 UNKNOWN 被 X123-00148C 替换。再次,抱歉问了这么长的问题。

data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
        'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
        'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
        'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}  

data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'], 
         'car_part': ['100009','100009','100009','100033','100033','100033']}  

data3 = {'car_part': ['100009','100009','100009','100033','100033','100033'],
         'car_number': ['X123-00027C', 'X123-00031C', 'X123-00041C', 'X123-00106C', 'X123-00143C', 'X123-00148C'],
         'delivery': ['8/27/2004', '8/27/2004', '8/27/2004', '3/27/2006', '3/27/2006', '3/27/2006']}  

df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)
df3 = pd.DataFrame(data3)
print(df3)

df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])

df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])

df3['delivery'] = df3['delivery'].astype('datetime64[ns]')
df3.sort_values(by = ['car_part', 'car_number', 'delivery'], ascending=[True, True, True])

I apologize in advance for a long winded question. I am new to python and not really sure where to begin with this problem.

I have a column 'car_number' in df that sometimes has UNKNOWN values. When there is an UNKNOWN:

  • Use the 'delivery' date in that row and check it against the oldest 'delivery' date of the same 'car_part' number in df2
  • If the dates matches within a 3 months period for df and df2 forward or backward
    • Go to df3 and match 'delivery' date using same 3 months condition forward or backward
      • If it matches, then take the lowest car_number and replace it with UNKNOWN

Once UNKNOWN is replaced, create a new column called formula in df and applies a formula to check if the date will return a negative. The formula:

  • Checks if the new 'car_number' matches the 'car_number' on the next row.
  • If it matches then take 'delivery' date and subtract 'test' date and returns number of days between the two dates.
  • If the number is positive or blank, then moves onto the next UNKNOWN
  • If the number is negative
    • Then go back to df3 and pick the next car_number and replace
    • Applies the formula again until a positive number is returned
  • If the new car_number doesn't match the car_number on the next row, returns nothing.

The end result should show that the first UNKNOWN is replaced by X123-00027C and second UNKNOWN by X123-00148C. Again, sorry for the long question.

data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
        'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
        'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
        'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}  

data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'], 
         'car_part': ['100009','100009','100009','100033','100033','100033']}  

data3 = {'car_part': ['100009','100009','100009','100033','100033','100033'],
         'car_number': ['X123-00027C', 'X123-00031C', 'X123-00041C', 'X123-00106C', 'X123-00143C', 'X123-00148C'],
         'delivery': ['8/27/2004', '8/27/2004', '8/27/2004', '3/27/2006', '3/27/2006', '3/27/2006']}  

df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)
df3 = pd.DataFrame(data3)
print(df3)

df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])

df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])

df3['delivery'] = df3['delivery'].astype('datetime64[ns]')
df3.sort_values(by = ['car_part', 'car_number', 'delivery'], ascending=[True, True, True])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文