从另一列填充丢失值

发布于 2025-02-10 06:05:02 字数 970 浏览 0 评论 0原文

lotfrontage列与lotarea有关系 LotFrontage的值在0.005％-0.01％的lotarea之间。

我试图在缺少LotFrontage的Lotarea的0.005％-0.01％之间获取随机值。

示例：在PIC中，lotFrontage缺少1019个索引值。我想用lotarea值8978 * 0.005至8978 * 0.01

代码（解决此问题）：

np.where(df_train[df_train["LotFrontage"].isnull()], np.random.rand(df_train['LotArea']*0.005, df_train["LotArea"]*0.01),df_train["LotFrontage"])

Error:
TypeError                                 Traceback (most recent call last)
<ipython-input-46-49a940deebcd> in <module>()
----> 1 np.random.rand(df_train['LotArea'] *0.005,df_train["LotArea"] * 0.01)
mtrand.pyx in numpy.random.mtrand.RandomState.rand()
mtrand.pyx in numpy.random.mtrand.RandomState.random_sample()
_common.pyx in numpy.random._common.double_fill()
TypeError: 'Series' object cannot be interpreted as an integer

原文

enter image description here

LotFrontage column have relationship with LotArea
the values of LotFrontage is between 0.005% - 0.01% of the LotArea.

I am trying to get the random values between 0.005% - 0.01% of LotArea where LotFrontage is missing.

Example: In the pic at 1019 index values is missing for LotFrontage. I want to fill it with LotArea value 8978 * 0.005 to 8978 * 0.01

Code(to solve this issue):

np.where(df_train[df_train["LotFrontage"].isnull()], np.random.rand(df_train['LotArea']*0.005, df_train["LotArea"]*0.01),df_train["LotFrontage"])

Error:
TypeError                                 Traceback (most recent call last)
<ipython-input-46-49a940deebcd> in <module>()
----> 1 np.random.rand(df_train['LotArea'] *0.005,df_train["LotArea"] * 0.01)
mtrand.pyx in numpy.random.mtrand.RandomState.rand()
mtrand.pyx in numpy.random.mtrand.RandomState.random_sample()
_common.pyx in numpy.random._common.double_fill()
TypeError: 'Series' object cannot be interpreted as an integer

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

潇烟暮雨 2025-02-17 06:05:02

这种方法怎么样？

import numpy as np
import pandas as pd

LotArea = np.arange(100, 200, 10)
LotFrontage = np.array([np.nan, *[11, 13, 14, 5], np.nan, *[18, 19, 26, 12]])
df_train = pd.DataFrame({"LotArea": LotArea, "LotFrontage": LotFrontage})
df_train.LotFrontage = df_train.LotFrontage.apply(
    lambda x: df_train.LotArea.sample(n=1).to_numpy()[0]
    * np.random.randint(5, 10)
    / 1000
    if pd.isna(x)
    else x
)
print(df_train)

比，我们改变了这一点：

     LotArea  LotFrontage
0      100          NaN
1      110         11.0
2      120         13.0
3      130         14.0
4      140          5.0
5      150          NaN
6      160         18.0
7      170         19.0
8      180         26.0
9      190         12.0

对此：

     LotArea  LotFrontage
0      100         0.48
1      110        11.00
2      120        13.00
3      130        14.00
4      140         5.00
5      150         0.76
6      160        18.00
7      170        19.00
8      180        26.00
9      190        12.00

How about this approach?

import numpy as np
import pandas as pd

LotArea = np.arange(100, 200, 10)
LotFrontage = np.array([np.nan, *[11, 13, 14, 5], np.nan, *[18, 19, 26, 12]])
df_train = pd.DataFrame({"LotArea": LotArea, "LotFrontage": LotFrontage})
df_train.LotFrontage = df_train.LotFrontage.apply(
    lambda x: df_train.LotArea.sample(n=1).to_numpy()[0]
    * np.random.randint(5, 10)
    / 1000
    if pd.isna(x)
    else x
)
print(df_train)

Than, we transform this:

     LotArea  LotFrontage
0      100          NaN
1      110         11.0
2      120         13.0
3      130         14.0
4      140          5.0
5      150          NaN
6      160         18.0
7      170         19.0
8      180         26.0
9      190         12.0

To this:

     LotArea  LotFrontage
0      100         0.48
1      110        11.00
2      120        13.00
3      130        14.00
4      140         5.00
5      150         0.76
6      160        18.00
7      170        19.00
8      180        26.00
9      190        12.00

回复收藏 0 原文

~没有更多了~