Python Pandas面板数据 - 通过其他时期填充缺失值的信息

发布于 2025-02-13 16:44:45 字数 2017 浏览 0 评论 0原文

我正在使用一个数据集数据集。 也就是说,在许多时间段内,我对某些单位有观察。

例如:

dates = 3 * list(pd.date_range(start='1/31/2018', end='3/31/2018', freq="M"))
unit_id = ["id_1", "id_1", "id_1", "id_2", "id_2", "id_2", "id_3", "id_3", "id_3"]
locations = ["loc_1", "loc_1", np.nan, "loc_2", "loc_2", np.nan, "loc_3", "loc_3", np.nan]
var_1 = ["x1_t1", "x1_t2", "x1_t3", "x2_t1", "x2_t2", "x2_t3", "x3_t1", "x3_t2", "x3_t3"]
var_2 = ["z1_t1", "z1_t2", "z1_t3", "z2_t1", "z2_t2", "z2_t3", "z3_t1", "z3_t2", "z3_t3"]

_ = pd.DataFrame({"date": dates, "id": unit_id, "location": locations, "var_1": var_1, "var_2": var_2})

这给了我这样的东西:

日期ID位置VAR_1VAR_2
02018-01-31ID_1LOC_1X1_T1Z1_T1
12018-02-28ID_1LOC_1 LOC_1X1_T2Z1_T2
22018-03-31ID_1 ID_1NANX1_T3Z1_T3 Z1_T3
32018-01-01-01-01ID__________________2x2 x2X2-12 X2-12 x2-12 X2-12x2-2 x2-2 x2-2 x2-2 x2 x2
x2 x2 x2x2 x22-2 28ID_2loc_2x2_t2z2_t2
52018-03-31id_2NaNx2_t3z2_t3
62018-01-31id_3loc_3x3_t1z3_t1
72018-02-28id_3loc_3x3_t2z3_t2
82018-03-31id_3NaNx3_t3z3_t3

My dataframe is 不像示例那样订购。它是按时间订购的。

同样,面板是不平衡的,这意味着并非每个时期都出现所有单元。

我要做的是,如果单位出现在其他时期并具有位置信息,则填充与其他时期相对应的值(即ID匹配)的值(即ID匹配)。不弄乱其他变量。

有技巧吗?

I am working with a data set of panel data.
That is, I have observations of some units over many time periods.

For example:

dates = 3 * list(pd.date_range(start='1/31/2018', end='3/31/2018', freq="M"))
unit_id = ["id_1", "id_1", "id_1", "id_2", "id_2", "id_2", "id_3", "id_3", "id_3"]
locations = ["loc_1", "loc_1", np.nan, "loc_2", "loc_2", np.nan, "loc_3", "loc_3", np.nan]
var_1 = ["x1_t1", "x1_t2", "x1_t3", "x2_t1", "x2_t2", "x2_t3", "x3_t1", "x3_t2", "x3_t3"]
var_2 = ["z1_t1", "z1_t2", "z1_t3", "z2_t1", "z2_t2", "z2_t3", "z3_t1", "z3_t2", "z3_t3"]

_ = pd.DataFrame({"date": dates, "id": unit_id, "location": locations, "var_1": var_1, "var_2": var_2})

This gives me something like this:

dateidlocationvar_1var_2
02018-01-31id_1loc_1x1_t1z1_t1
12018-02-28id_1loc_1x1_t2z1_t2
22018-03-31id_1NaNx1_t3z1_t3
32018-01-31id_2loc_2x2_t1z2_t1
42018-02-28id_2loc_2x2_t2z2_t2
52018-03-31id_2NaNx2_t3z2_t3
62018-01-31id_3loc_3x3_t1z3_t1
72018-02-28id_3loc_3x3_t2z3_t2
82018-03-31id_3NaNx3_t3z3_t3

My dataframe is not ordered like the example. It is ordered by time.

Also, the panel is unbalances, meaning not all units show up in every period.

What I want to do is to fill the location values that are NaN with values that correspond to the same unit (i.e. id matches) from other periods, if the unit appears in some other period and has location information. Without messing up the other variables.

Any tips?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦中楼上月下 2025-02-20 16:44:45

这是在dataframe之后的延续(在下面代码中调用 df )是创建

id_loc_df = df[['id','location']].dropna().drop_duplicates().rename(columns={'location':'location_fill'})

df_updated = pd.merge(df, id_loc_df,left_on='id',right_on='id')

df_updated['location'] = np.where(df_updated['location'].isna(), df_updated['location_fill'], df_updated['location'])

df_updated.drop('location_fill', axis=1)

输出的:

”输入图像描述在这里”

This is the continuation of your code after dataframe (calling it df in below code) is created

id_loc_df = df[['id','location']].dropna().drop_duplicates().rename(columns={'location':'location_fill'})

df_updated = pd.merge(df, id_loc_df,left_on='id',right_on='id')

df_updated['location'] = np.where(df_updated['location'].isna(), df_updated['location_fill'], df_updated['location'])

df_updated.drop('location_fill', axis=1)

Output:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文