从另一列填充丢失值

发布于 2025-02-10 06:05:02 字数 970 浏览 0 评论 0原文

在此处输入图像描述

lotfrontage列与lotarea有关系 LotFrontage的值在0.005%-0.01%的lotarea之间。

我试图在缺少LotFrontage的Lotarea的0.005%-0.01%之间获取随机值。

示例:在PIC中,lotFrontage缺少1019个索引值。我想用lotarea值8978 * 0.005至8978 * 0.01

代码(解决此问题):

np.where(df_train[df_train["LotFrontage"].isnull()], np.random.rand(df_train['LotArea']*0.005, df_train["LotArea"]*0.01),df_train["LotFrontage"])
Error:
TypeError                                 Traceback (most recent call last)
<ipython-input-46-49a940deebcd> in <module>()
----> 1 np.random.rand(df_train['LotArea'] *0.005,df_train["LotArea"] * 0.01)
mtrand.pyx in numpy.random.mtrand.RandomState.rand()
mtrand.pyx in numpy.random.mtrand.RandomState.random_sample()
_common.pyx in numpy.random._common.double_fill()
TypeError: 'Series' object cannot be interpreted as an integer

enter image description here

LotFrontage column have relationship with LotArea
the values of LotFrontage is between 0.005% - 0.01% of the LotArea.

I am trying to get the random values between 0.005% - 0.01% of LotArea where LotFrontage is missing.

Example: In the pic at 1019 index values is missing for LotFrontage. I want to fill it with LotArea value 8978 * 0.005 to 8978 * 0.01

Code(to solve this issue):

np.where(df_train[df_train["LotFrontage"].isnull()], np.random.rand(df_train['LotArea']*0.005, df_train["LotArea"]*0.01),df_train["LotFrontage"])
Error:
TypeError                                 Traceback (most recent call last)
<ipython-input-46-49a940deebcd> in <module>()
----> 1 np.random.rand(df_train['LotArea'] *0.005,df_train["LotArea"] * 0.01)
mtrand.pyx in numpy.random.mtrand.RandomState.rand()
mtrand.pyx in numpy.random.mtrand.RandomState.random_sample()
_common.pyx in numpy.random._common.double_fill()
TypeError: 'Series' object cannot be interpreted as an integer

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

潇烟暮雨 2025-02-17 06:05:02

这种方法怎么样?

import numpy as np
import pandas as pd

LotArea = np.arange(100, 200, 10)
LotFrontage = np.array([np.nan, *[11, 13, 14, 5], np.nan, *[18, 19, 26, 12]])
df_train = pd.DataFrame({"LotArea": LotArea, "LotFrontage": LotFrontage})
df_train.LotFrontage = df_train.LotFrontage.apply(
    lambda x: df_train.LotArea.sample(n=1).to_numpy()[0]
    * np.random.randint(5, 10)
    / 1000
    if pd.isna(x)
    else x
)
print(df_train)

比,我们改变了这一点:

     LotArea  LotFrontage
0      100          NaN
1      110         11.0
2      120         13.0
3      130         14.0
4      140          5.0
5      150          NaN
6      160         18.0
7      170         19.0
8      180         26.0
9      190         12.0

对此:

     LotArea  LotFrontage
0      100         0.48
1      110        11.00
2      120        13.00
3      130        14.00
4      140         5.00
5      150         0.76
6      160        18.00
7      170        19.00
8      180        26.00
9      190        12.00

How about this approach?

import numpy as np
import pandas as pd

LotArea = np.arange(100, 200, 10)
LotFrontage = np.array([np.nan, *[11, 13, 14, 5], np.nan, *[18, 19, 26, 12]])
df_train = pd.DataFrame({"LotArea": LotArea, "LotFrontage": LotFrontage})
df_train.LotFrontage = df_train.LotFrontage.apply(
    lambda x: df_train.LotArea.sample(n=1).to_numpy()[0]
    * np.random.randint(5, 10)
    / 1000
    if pd.isna(x)
    else x
)
print(df_train)

Than, we transform this:

     LotArea  LotFrontage
0      100          NaN
1      110         11.0
2      120         13.0
3      130         14.0
4      140          5.0
5      150          NaN
6      160         18.0
7      170         19.0
8      180         26.0
9      190         12.0

To this:

     LotArea  LotFrontage
0      100         0.48
1      110        11.00
2      120        13.00
3      130        14.00
4      140         5.00
5      150         0.76
6      160        18.00
7      170        19.00
8      180        26.00
9      190        12.00
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文