如何使用多种条件,包括在Python中选择分位数

发布于 2025-01-31 02:03:19 字数 1469 浏览 4 评论 0原文

想象以下数据集df

群体距离
140050
250030
330040
4200120
550060
6100050 50
73300 330030
850090 90
909700100
101000 110110 110
11900200
12 1285030

df ['perse_density']的值高于第三个位点(> 75%)和df ['距离时'] is< 100,而其余数据给出0?因此,第6和7行应该具有1,而其他行应该具有0。

创建一个只有一个标准的虚拟变量很容易。例如,以下条件适用于创建一个新的虚拟变量,该变量在距离为< 100和0否则时包含1个,否则:df ['dange_below_100'] = np.np.where(df ['danction']&lt ; 100,1,0)。但是,我不知道如何结合条件,其中一个条件包括分位数选择(在这种情况下,变量puse> supers_dense的上部25%。

import pandas as pd  
  
# assign data of lists.  
data = {'Row': range(1,13,1), 'Population_density': [400, 500, 300, 200, 500, 1000, 3300, 500, 700, 1000, 900, 850],
        'Distance': [50, 30, 40, 120, 60, 50, 30, 90, 100, 110, 200, 30]}  
  
# Create DataFrame  
df = pd.DataFrame(data) 

Imagine the following dataset df:

RowPopulation_densityDistance
140050
250030
330040
4200120
550060
6100050
7330030
850090
9700100
101000110
11900200
1285030

How can I make a new dummy column that represents a 1 when values of df['Population_density'] are above the third quantile (>75%) AND the df['Distance'] is < 100, while a 0 is given to the remainder of the data? Consequently, rows 6 and 7 should have a 1 while the other rows should have a 0.

Creating a dummy variable with only one criterium can be fairly easy. For instance, the following condition works for creating a new dummy variable that contains a 1 when the Distance is <100 and a 0 otherwise: df['Distance_Below_100'] = np.where(df['Distance'] < 100, 1, 0). However, I do not know how to combine conditions whereby one of the conditions includes a quantile selection (in this case, the upper 25% of the variable Population_density.

import pandas as pd  
  
# assign data of lists.  
data = {'Row': range(1,13,1), 'Population_density': [400, 500, 300, 200, 500, 1000, 3300, 500, 700, 1000, 900, 850],
        'Distance': [50, 30, 40, 120, 60, 50, 30, 90, 100, 110, 200, 30]}  
  
# Create DataFrame  
df = pd.DataFrame(data) 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

酒绊 2025-02-07 02:03:19

您可以使用&amp;|加入条件

import numpy as np

df['Distance_Below_100'] = np.where(df['Population_density'].gt(df['Population_density'].quantile(0.75)) & df['Distance'].lt(100), 1, 0)
print(df)

    Row  Population_density  Distance  Distance_Below_100
0     1                 400        50                   0
1     2                 500        30                   0
2     3                 300        40                   0
3     4                 200       120                   0
4     5                 500        60                   0
5     6                1000        50                   1
6     7                3300        30                   1
7     8                 500        90                   0
8     9                 700       100                   0
9    10                1000       110                   0
10   11                 900       200                   0
11   12                 850        30                   0

You can use & or | to join the conditions

import numpy as np

df['Distance_Below_100'] = np.where(df['Population_density'].gt(df['Population_density'].quantile(0.75)) & df['Distance'].lt(100), 1, 0)
print(df)

    Row  Population_density  Distance  Distance_Below_100
0     1                 400        50                   0
1     2                 500        30                   0
2     3                 300        40                   0
3     4                 200       120                   0
4     5                 500        60                   0
5     6                1000        50                   1
6     7                3300        30                   1
7     8                 500        90                   0
8     9                 700       100                   0
9    10                1000       110                   0
10   11                 900       200                   0
11   12                 850        30                   0
悲凉≈ 2025-02-07 02:03:19

为了在数据框架上发挥作用,我建议使用lambda。

例如,这是您的功能:

def myFunction(value):
 pass

创建一个新列“ new_column”,(pick_cell)是您要在哪个函数上创建的单元格:

df['new_column']= df.apply(lambda x : myFunction(x.pick_cell))

he, to make a function on data frame i recommended to use lambda.

for example this is your function:

def myFunction(value):
 pass

to create a new column 'new_column', (pick_cell) is which cell you want to make a function on:

df['new_column']= df.apply(lambda x : myFunction(x.pick_cell))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文