使用 MissForest 算法对 python 列中的每个组填充缺失值

发布于 2025-01-09 18:49:23 字数 3130 浏览 4 评论 0原文

我有大约 4000 名患者的时间序列数据，其中存在缺失值，我想使用 Python 中的 MissForest 算法对每个患者文件分别估算 NaN 值。

数据如下所示：

HR	Resp	P_ID
72.0	18.0	1
NaN	15.0	1
80.0	NaN	1
NaN	16.0	1
79.5	NaN	1
NaN	19.0	2
79.5	22.5	2
NaN	NaN	2
NaN	16.0	2
85.0	NaN	3
NaN	14.5	3
76.4	NaN	3
NaN	NaN	4
80.5	19.5	4
75.3	18.0	4
NaN	21.5	4

现在，我想根据 P_ID 在列中的患者数据中估算 NaN 值。就像它会估算P_ID = 1，然后估算P_ID = 2，依此类推。不是对整个列的插补。我使用的代码将把 NaN 归咎于所有患者的整个列，而不是单个患者列，然后是下一个患者。

imputer = MissForest(max_iter=12, n_jobs=-1)
X_imputed = imputer.fit_transform(df)
df1 = pd.DataFrame(X_imputed)
df1.head()

我使用以下代码对患者本身进行了平均插补，但无法弄清楚如何将其用于 MissForest。

for i in ['HR','Resp']:
    df[i] = df[i].fillna(df.groupby('P_ID')[i].transform('mean'))

一个解决方案是我为每个患者制作 4000 个数据帧，使用 MissForest 对其进行估算，然后将它们组合在一起。这将是一项繁忙的任务。所以我想要一个循环整个数据帧的解决方案。请帮忙。谢谢。

原文

I have a time series data of about 4000 patients that has missing values and I want to impute NaN values using MissForest algorithm in Python on each patient file separately.

The data looks like this:

HR	Resp	P_ID
72.0	18.0	1
NaN	15.0	1
80.0	NaN	1
NaN	16.0	1
79.5	NaN	1
NaN	19.0	2
79.5	22.5	2
NaN	NaN	2
NaN	16.0	2
85.0	NaN	3
NaN	14.5	3
76.4	NaN	3
NaN	NaN	4
80.5	19.5	4
75.3	18.0	4
NaN	21.5	4

Now, I want to impute NaN values within the patients data in column based on P_ID. Like it will impute P_ID = 1, then P_ID = 2 and so on. Not the imputation on the whole column. The code I am using will impute NaN on whole column of all patients, not in individual Patients column, then the next patient.

imputer = MissForest(max_iter=12, n_jobs=-1)
X_imputed = imputer.fit_transform(df)
df1 = pd.DataFrame(X_imputed)
df1.head()

I did the Mean Imputation within patient itself using the following code, but can't figure out how I can use it for MissForest.

for i in ['HR','Resp']:
    df[i] = df[i].fillna(df.groupby('P_ID')[i].transform('mean'))

One solution is I make 4000 data frames of each patient, impute them using MissForest, then combine them together. That will be a hectic task. So I want a solution with looping over the entire dataframe. Kindly help. Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

嗼ふ静 2025-01-16 18:49:23

您可以使用以下命令来遍历 P_ID，然后仅将 MissForest 应用于过滤后的值：

for idx in df["ID"].unique():
    # check if the column "Resp" is all nan
    if not df[df.ID == idx].Resp.any():
        df.loc[df.ID == idx, "Resp"] = df.loc[df.ID == idx, "Resp"].fillna(0)
    imputer = MissForest(max_iter=12, n_jobs=-1)
    x_imp = imputer.fit_transform(df[df.ID == idx])
    df.loc[df.ID == idx, :] = x_imp

这将为您提供：

|    |      HR |    Resp |   ID |
|---:|--------:|--------:|-----:|
|  0 | 72      | 18      |    1 |
|  1 | 79.5942 | 15      |    1 |
|  2 | 80      | 15.4617 |    1 |
|  3 | 79.5942 | 16      |    1 |
|  4 | 79.5    | 15.4617 |    1 |
|  5 | 79.5    | 19      |    2 |
|  6 | 79.5    | 22.5    |    2 |
|  7 | 79.5    | 18.9217 |    2 |
|  8 | 79.5    | 16      |    2 |
|  9 | 85      | 14.5    |    3 |
| 10 | 80.786  | 14.5    |    3 |
| 11 | 76.4    | 14.5    |    3 |
| 12 | 79.148  | 20.885  |    4 |
| 13 | 80.5    | 19.5    |    4 |
| 14 | 75.3    | 18      |    4 |
| 15 | 79.148  | 21.5    |    4 |

You can use the following to go through the P_IDs, then apply the MissForest only on the filtered values:

for idx in df["ID"].unique():
    # check if the column "Resp" is all nan
    if not df[df.ID == idx].Resp.any():
        df.loc[df.ID == idx, "Resp"] = df.loc[df.ID == idx, "Resp"].fillna(0)
    imputer = MissForest(max_iter=12, n_jobs=-1)
    x_imp = imputer.fit_transform(df[df.ID == idx])
    df.loc[df.ID == idx, :] = x_imp

This gives you:

|    |      HR |    Resp |   ID |
|---:|--------:|--------:|-----:|
|  0 | 72      | 18      |    1 |
|  1 | 79.5942 | 15      |    1 |
|  2 | 80      | 15.4617 |    1 |
|  3 | 79.5942 | 16      |    1 |
|  4 | 79.5    | 15.4617 |    1 |
|  5 | 79.5    | 19      |    2 |
|  6 | 79.5    | 22.5    |    2 |
|  7 | 79.5    | 18.9217 |    2 |
|  8 | 79.5    | 16      |    2 |
|  9 | 85      | 14.5    |    3 |
| 10 | 80.786  | 14.5    |    3 |
| 11 | 76.4    | 14.5    |    3 |
| 12 | 79.148  | 20.885  |    4 |
| 13 | 80.5    | 19.5    |    4 |
| 14 | 75.3    | 18      |    4 |
| 15 | 79.148  | 21.5    |    4 |

回复收藏 0 原文

~没有更多了~