从两个日期变量作为Python范围起,我如何创建一个年份的虚拟变量?

发布于 2025-02-09 03:05:22 字数 616 浏览 3 评论 0原文

我具有以下数据格式:

name     role       startdate      enddate
abby     associate  2/15/2010      6/13/2012
bobby    intern     6/21/2013      1/10/2014
james    manager    2/12/2012      5/13/2015

我想为Startdate和端范围范围创建一个年度虚拟变量(例如,如果StartDate和EndDate在1/1/1010-12/31/2010范围内,则将产生以下输出:

name     role       startdate      enddate     2010    2011    2012
abby     associate  2/15/2010      6/13/2012    1       1       1
bobby    intern     6/21/2013      1/10/2014    0       0       0
james    manager    2/12/2012      5/13/2015    0       0       1

预先感谢您

I have the following data format:

name     role       startdate      enddate
abby     associate  2/15/2010      6/13/2012
bobby    intern     6/21/2013      1/10/2014
james    manager    2/12/2012      5/13/2015

I want to create a year dummy variable for the startdate and enddate range (e.g. 2010=1 if the startdate and enddate is within 1/1/2010 - 12/31/2010 range, so that it will generate the following output:

name     role       startdate      enddate     2010    2011    2012
abby     associate  2/15/2010      6/13/2012    1       1       1
bobby    intern     6/21/2013      1/10/2014    0       0       0
james    manager    2/12/2012      5/13/2015    0       0       1

Thank you in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

长梦不多时 2025-02-16 03:05:22

您必须从头开始实现函数

# Change to pandas datetime type
df[['startdate','enddate']] = df[['startdate','enddate']].apply(pd.to_datetime, infer_datetime_format=True)
# take years from date columns
df['startdate_year'] =pd.DatetimeIndex(df['startdate']).year
df['enddate_year'] =pd.DatetimeIndex(df['enddate']).year

## min and max values of column
min = df['startdate_year'].min()
max =df['enddate_year'].max()

# creating dictionary  
year_dict = {k: [] for v, k in enumerate([i for i in range(min,max+1)])}
#{2010: [], 2011: [], 2012: [], 2013: [], 2014: [], 2015: []}

def getdummi(start,end):
    values = [i for i in range(start,end+1)]
    return [1 if i in values else 0 for i in year_dict.keys()]
# getdummi(2012,2014) 
#[0, 0, 1, 1, 1, 0]

data = []
for index,row in df.iterrows():
    data.append(getdummi(row['startdate_year'],row['enddate_year']))

dfdummi =pd.DataFrame(data,columns=list(year_dict.keys()))
dfdummi

”在此处输入图像描述”

然后concat在pandas

df = pd.concat([df,dfdummi],axis=1)

You have to implement function from scratch

# Change to pandas datetime type
df[['startdate','enddate']] = df[['startdate','enddate']].apply(pd.to_datetime, infer_datetime_format=True)
# take years from date columns
df['startdate_year'] =pd.DatetimeIndex(df['startdate']).year
df['enddate_year'] =pd.DatetimeIndex(df['enddate']).year

## min and max values of column
min = df['startdate_year'].min()
max =df['enddate_year'].max()

# creating dictionary  
year_dict = {k: [] for v, k in enumerate([i for i in range(min,max+1)])}
#{2010: [], 2011: [], 2012: [], 2013: [], 2014: [], 2015: []}

def getdummi(start,end):
    values = [i for i in range(start,end+1)]
    return [1 if i in values else 0 for i in year_dict.keys()]
# getdummi(2012,2014) 
#[0, 0, 1, 1, 1, 0]

data = []
for index,row in df.iterrows():
    data.append(getdummi(row['startdate_year'],row['enddate_year']))

dfdummi =pd.DataFrame(data,columns=list(year_dict.keys()))
dfdummi

enter image description here

Then concat result in pandas

df = pd.concat([df,dfdummi],axis=1)

enter image description here

缪败 2025-02-16 03:05:22

创建虚拟数据框,只需启动和结束年,

 df_test = pd.DataFrame([[2012,2014],[2015,2017],[2010, 2020]], columns= ['start_date', 'end_date'])

 #make sure to convert year column to numerical value , and use pandas loc function to assign new column with value 1

 for current_col in range(min_year, max_year+1):
     df_test.loc[(df_test['start_date']<=current_col) & (df_test['end_date']>=current_col), current_col] = 1

以零填充空白值,

  df_test = df_test.fillna(0)

您将获得所需的输出

creating dummy dataframe with just start and end year

 df_test = pd.DataFrame([[2012,2014],[2015,2017],[2010, 2020]], columns= ['start_date', 'end_date'])

 #make sure to convert year column to numerical value , and use pandas loc function to assign new column with value 1

 for current_col in range(min_year, max_year+1):
     df_test.loc[(df_test['start_date']<=current_col) & (df_test['end_date']>=current_col), current_col] = 1

fill blank value with zero

  df_test = df_test.fillna(0)

you will get the desired output

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文