如何使用多种条件,包括在Python中选择分位数
想象以下数据集df
:
行 | 群体 | 距离 |
---|---|---|
1 | 400 | 50 |
2 | 500 | 30 |
3 | 300 | 40 |
4 | 200 | 120 |
5 | 500 | 60 |
6 | 1000 | 50 50 |
7 | 3300 3300 | 30 |
8 | 500 | 90 90 |
90 | 9700 | 100 |
10 | 1000 110 | 110 110 |
11 | 900 | 200 |
12 12 | 850 | 30 |
当df ['perse_density']
的值高于第三个位点(> 75%)和df ['距离时']
is< 100,而其余数据给出0?因此,第6和7行应该具有1,而其他行应该具有0。
创建一个只有一个标准的虚拟变量很容易。例如,以下条件适用于创建一个新的虚拟变量,该变量在距离为< 100和0否则时包含1个,否则:df ['dange_below_100'] = np.np.where(df ['danction']&lt ; 100,1,0)
。但是,我不知道如何结合条件,其中一个条件包括分位数选择(在这种情况下,变量puse> supers_dense
的上部25%。
import pandas as pd
# assign data of lists.
data = {'Row': range(1,13,1), 'Population_density': [400, 500, 300, 200, 500, 1000, 3300, 500, 700, 1000, 900, 850],
'Distance': [50, 30, 40, 120, 60, 50, 30, 90, 100, 110, 200, 30]}
# Create DataFrame
df = pd.DataFrame(data)
Imagine the following dataset df
:
Row | Population_density | Distance |
---|---|---|
1 | 400 | 50 |
2 | 500 | 30 |
3 | 300 | 40 |
4 | 200 | 120 |
5 | 500 | 60 |
6 | 1000 | 50 |
7 | 3300 | 30 |
8 | 500 | 90 |
9 | 700 | 100 |
10 | 1000 | 110 |
11 | 900 | 200 |
12 | 850 | 30 |
How can I make a new dummy column that represents a 1 when values of df['Population_density']
are above the third quantile (>75%) AND the df['Distance']
is < 100, while a 0 is given to the remainder of the data? Consequently, rows 6 and 7 should have a 1 while the other rows should have a 0.
Creating a dummy variable with only one criterium can be fairly easy. For instance, the following condition works for creating a new dummy variable that contains a 1 when the Distance is <100 and a 0 otherwise: df['Distance_Below_100'] = np.where(df['Distance'] < 100, 1, 0)
. However, I do not know how to combine conditions whereby one of the conditions includes a quantile selection (in this case, the upper 25% of the variable Population_density
.
import pandas as pd
# assign data of lists.
data = {'Row': range(1,13,1), 'Population_density': [400, 500, 300, 200, 500, 1000, 3300, 500, 700, 1000, 900, 850],
'Distance': [50, 30, 40, 120, 60, 50, 30, 90, 100, 110, 200, 30]}
# Create DataFrame
df = pd.DataFrame(data)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用
&amp;
或|
加入条件You can use
&
or|
to join the conditions为了在数据框架上发挥作用,我建议使用lambda。
例如,这是您的功能:
创建一个新列“ new_column”,(pick_cell)是您要在哪个函数上创建的单元格:
he, to make a function on data frame i recommended to use lambda.
for example this is your function:
to create a new column 'new_column', (pick_cell) is which cell you want to make a function on: