通过另一列对大熊猫列进行分类

发布于 2025-02-11 00:52:58 字数 911 浏览 0 评论 0原文

我的数据看起来像这样：

我正在使用以下脚本将RP8_recruise填充为“ y”（y y”（近_dist＆lt; 100 meters＆lt; 100 meters）或“ n”（接近_dist＆gt; 100米）。

nrows = plots_dist_joined.shape[0]

for i in range(0, nrows):
    
    # for plots that are within wanted distance from disturbance harvest 
    if (plots_dist_joined.iloc[i,9] < 100) | (plots_dist_joined.iloc[i,9] == 100):
        plots_dist_joined["RP_"+reporting_period+"Recruise"] = "Y"
        plots_dist_joined["RP_"+reporting_period+"RecrType"] = "PD"
    
    # for plots that are NOT within wanted distance from disturbance harvest 
    else:
        plots_dist_joined["RP_"+reporting_period+"Recruise"] = "N"
        plots_dist_joined["RP_"+reporting_period+"RecrType"] = np.nan

即使有100米以下的距离（IDS = 59197、40、84、92、132），这将整个RP_8RECRUISE列填充为“ N”。我不确定代码中有什么问题。

原文

My data looks like this:
screenshot of data

I'm using the following script to populate the RP8_Recruise as either "Y" (NEAR_DIST< 100 meters) or "N" (NEAR_DIST> 100 meters).

nrows = plots_dist_joined.shape[0]

for i in range(0, nrows):
    
    # for plots that are within wanted distance from disturbance harvest 
    if (plots_dist_joined.iloc[i,9] < 100) | (plots_dist_joined.iloc[i,9] == 100):
        plots_dist_joined["RP_"+reporting_period+"Recruise"] = "Y"
        plots_dist_joined["RP_"+reporting_period+"RecrType"] = "PD"
    
    # for plots that are NOT within wanted distance from disturbance harvest 
    else:
        plots_dist_joined["RP_"+reporting_period+"Recruise"] = "N"
        plots_dist_joined["RP_"+reporting_period+"RecrType"] = np.nan

This populates the entire RP_8Recruise column as "N" even though there are distances that are under 100 meters (IDs = 59197, 40, 84, 92, 132). I'm not sure what is wrong in the code.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

纵情客 2025-02-18 00:52:58

代码的问题在于，在每次迭代中，将为整个rp_8recruise和rp_8recrtype列分配一个新值。这些列的最终值是由最后一行中的df.near_dist值决定的。

而不是循环使用vectorized numpy.where（）填充值的方法

# a mask that checks if it's near
is_near = df.NEAR_DIST <= 100
# if near, Y, else N
plots_dist_joined["RP_8Recruise"] = np.where(is_near, "Y", "N")
# if near, PD, else NaN
plots_dist_joined["RP_8RecrType"] = np.where(is_near, "PD", np.nan)

The problem with your code is that in each iteration, a new value is being assigned to the entire RP_8Recruise and RP_8RecrType columns. The final values of these columns are being decided by the df.NEAR_DIST value in the last row.

Instead of a for-loop use vectorized numpy.where() method to fill in values

# a mask that checks if it's near
is_near = df.NEAR_DIST <= 100
# if near, Y, else N
plots_dist_joined["RP_8Recruise"] = np.where(is_near, "Y", "N")
# if near, PD, else NaN
plots_dist_joined["RP_8RecrType"] = np.where(is_near, "PD", np.nan)

回复收藏 0 原文

~没有更多了~