使用 MissForest 算法对 python 列中的每个组填充缺失值

发布于 2025-01-09 18:49:23 字数 3130 浏览 4 评论 0原文

我有大约 4000 名患者的时间序列数据,其中存在缺失值,我想使用 Python 中的 MissForest 算法对每个患者文件分别估算 NaN 值。

数据如下所示:

HRRespP_ID
72.018.01
NaN15.01
80.0NaN1
NaN16.01
79.5NaN1
NaN19.02
79.522.52
NaNNaN2
NaN16.02
85.0NaN3
NaN14.53
76.4NaN3
NaNNaN4
80.519.54
75.318.04
NaN21.54

现在,我想根据 P_ID 在列中的患者数据中估算 NaN 值。就像它会估算P_ID = 1,然后估算P_ID = 2,依此类推。不是对整个列的插补。我使用的代码将把 NaN 归咎于所有患者的整个列,而不是单个患者列,然后是下一个患者。

imputer = MissForest(max_iter=12, n_jobs=-1)
X_imputed = imputer.fit_transform(df)
df1 = pd.DataFrame(X_imputed)
df1.head()

我使用以下代码对患者本身进行了平均插补,但无法弄清楚如何将其用于 MissForest。

for i in ['HR','Resp']:
    df[i] = df[i].fillna(df.groupby('P_ID')[i].transform('mean'))

一个解决方案我为每个患者制作 4000 个数据帧使用 MissForest 对其进行估算,然后将它们组合在一起。这将是一项繁忙的任务。所以我想要一个循环整个数据帧的解决方案。请帮忙。谢谢。

I have a time series data of about 4000 patients that has missing values and I want to impute NaN values using MissForest algorithm in Python on each patient file separately.

The data looks like this:

HRRespP_ID
72.018.01
NaN15.01
80.0NaN1
NaN16.01
79.5NaN1
NaN19.02
79.522.52
NaNNaN2
NaN16.02
85.0NaN3
NaN14.53
76.4NaN3
NaNNaN4
80.519.54
75.318.04
NaN21.54

Now, I want to impute NaN values within the patients data in column based on P_ID. Like it will impute P_ID = 1, then P_ID = 2 and so on. Not the imputation on the whole column. The code I am using will impute NaN on whole column of all patients, not in individual Patients column, then the next patient.

imputer = MissForest(max_iter=12, n_jobs=-1)
X_imputed = imputer.fit_transform(df)
df1 = pd.DataFrame(X_imputed)
df1.head()

I did the Mean Imputation within patient itself using the following code, but can't figure out how I can use it for MissForest.

for i in ['HR','Resp']:
    df[i] = df[i].fillna(df.groupby('P_ID')[i].transform('mean'))

One solution is I make 4000 data frames of each patient, impute them using MissForest, then combine them together. That will be a hectic task. So I want a solution with looping over the entire dataframe. Kindly help. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

嗼ふ静 2025-01-16 18:49:23

您可以使用以下命令来遍历 P_ID,然后仅将 MissForest 应用于过滤后的值:

for idx in df["ID"].unique():
    # check if the column "Resp" is all nan
    if not df[df.ID == idx].Resp.any():
        df.loc[df.ID == idx, "Resp"] = df.loc[df.ID == idx, "Resp"].fillna(0)
    imputer = MissForest(max_iter=12, n_jobs=-1)
    x_imp = imputer.fit_transform(df[df.ID == idx])
    df.loc[df.ID == idx, :] = x_imp

这将为您提供:

|    |      HR |    Resp |   ID |
|---:|--------:|--------:|-----:|
|  0 | 72      | 18      |    1 |
|  1 | 79.5942 | 15      |    1 |
|  2 | 80      | 15.4617 |    1 |
|  3 | 79.5942 | 16      |    1 |
|  4 | 79.5    | 15.4617 |    1 |
|  5 | 79.5    | 19      |    2 |
|  6 | 79.5    | 22.5    |    2 |
|  7 | 79.5    | 18.9217 |    2 |
|  8 | 79.5    | 16      |    2 |
|  9 | 85      | 14.5    |    3 |
| 10 | 80.786  | 14.5    |    3 |
| 11 | 76.4    | 14.5    |    3 |
| 12 | 79.148  | 20.885  |    4 |
| 13 | 80.5    | 19.5    |    4 |
| 14 | 75.3    | 18      |    4 |
| 15 | 79.148  | 21.5    |    4 |

You can use the following to go through the P_IDs, then apply the MissForest only on the filtered values:

for idx in df["ID"].unique():
    # check if the column "Resp" is all nan
    if not df[df.ID == idx].Resp.any():
        df.loc[df.ID == idx, "Resp"] = df.loc[df.ID == idx, "Resp"].fillna(0)
    imputer = MissForest(max_iter=12, n_jobs=-1)
    x_imp = imputer.fit_transform(df[df.ID == idx])
    df.loc[df.ID == idx, :] = x_imp

This gives you:

|    |      HR |    Resp |   ID |
|---:|--------:|--------:|-----:|
|  0 | 72      | 18      |    1 |
|  1 | 79.5942 | 15      |    1 |
|  2 | 80      | 15.4617 |    1 |
|  3 | 79.5942 | 16      |    1 |
|  4 | 79.5    | 15.4617 |    1 |
|  5 | 79.5    | 19      |    2 |
|  6 | 79.5    | 22.5    |    2 |
|  7 | 79.5    | 18.9217 |    2 |
|  8 | 79.5    | 16      |    2 |
|  9 | 85      | 14.5    |    3 |
| 10 | 80.786  | 14.5    |    3 |
| 11 | 76.4    | 14.5    |    3 |
| 12 | 79.148  | 20.885  |    4 |
| 13 | 80.5    | 19.5    |    4 |
| 14 | 75.3    | 18      |    4 |
| 15 | 79.148  | 21.5    |    4 |
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文