计算列值计数为Python数据框中的条图
我有时间序列数据,希望看到 sepislabel 列中的化粪池(1)和非性(0)患者的总数。非性能患者没有“ 1”条目。虽然化粪池患者首先具有“零(0)”,但它变为“ 1”意味着它现在变成化粪池。数据看起来像这样:
HR | SBP | DBP | SEPSISLABEL | 性别 | P_ID |
---|---|---|---|---|---|
92 | 120 | 80 | 0 | 0 | 0 |
98 | 115 | 85 | 0 | 0 0 | 0 1 |
93 | 125 | 75 | 0 | 1 | 1 |
95 | 130 | 90 90 | 1 | 93 | 93 |
125 | 75 | 1 | 1 | 1 | 1 |
93 | 95 130 | 90 90 | 125 75 1 | 125 1 | 1 1 1 |
93 125 | 125 | 75 | 1 | 1 | 1 |
95 | 130 | 90 | 1 | 1 1 | 1 |
102 | 120 | 80 | 0 0 | 0 | 2 |
109 | 115 | 75 | 0 | 0 | 2 |
94 | 135 | 100 | 0 0 | 0 | 2 |
97 | 100 | 70 | 0 | 0 | 3 |
85 | 120 | 80 | 0 | 0 | 3 |
88 | 115 | 75 | 1 | 0 | 3 |
93 125 | 85 | 85 | 1 | 0 | 3 |
78 | 130 | 90 | 1 | 1 1 | 4 |
115 | 140 | 110 | 1 | 1 | 4 |
,这里有3名化粪池患者(P_ID = 1、3、4)和2名非骨化患者(P_ID = 0、2)。我想将这个数字绘制为条形图。因此,我使用以下代码手动执行此操作:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(7, 6))
ax = fig.add_axes([0,0,1,1])
sepsis = ['Non-Septic patients', 'Septic patients']
count = [2, 3]
ax.bar(sepsis, count)
ax.set_title("Septic and Non-septic patient count in the dataset", y = 1, fontsize = 15)
ax.set_xlabel('Patients', fontsize = 12)
ax.set_ylabel('Count', fontsize = 12)
for bars in ax.containers:
ax.bar_label(bars)
ax.margins(y=0.1)
plt.show()
- 但是,我不想手动计算化粪池和非性能患者的数量,因为我拥有的数据很大。这只是虚拟数据。我知道我必须使用P_ID列,但不确定如何。
- 我要绘制的第二件事是从这些化粪池和非性能患者中,基于 Gender 列的男性(1)和女性(1)。我想要这样的图表:
****更新****
使用drop_duplicates
默认情况下仅保留第一行。因此,最初具有0s
的化粪池患者,然后将其更改为1
,就会出现问题。即使患者是化粪池,也只使用代码也只有第一行。因此,化粪池患者的总数下降,而非性患者人数增加,这不应增加。是否只能将这些行保留在化粪池患者中,0
更改为1
?因此,所有化粪池患者的第一行中的sepislabel中都有1
,而不是0
。这将提供正确数量的化粪池患者。
I have time series data and want to see total number of Septic (1) and Non-septic (0) patients in the SepsisLabel column. The Non-septic patients don't have entries of '1'. While the Septic patients have first 'Zeros (0)' then it changes to '1' means it now becomes septic. The data looks like this:
HR | SBP | DBP | SepsisLabel | Gender | P_ID |
---|---|---|---|---|---|
92 | 120 | 80 | 0 | 0 | 0 |
98 | 115 | 85 | 0 | 0 | 0 |
93 | 125 | 75 | 0 | 1 | 1 |
95 | 130 | 90 | 0 | 1 | 1 |
93 | 125 | 75 | 1 | 1 | 1 |
95 | 130 | 90 | 1 | 1 | 1 |
93 | 125 | 75 | 1 | 1 | 1 |
95 | 130 | 90 | 1 | 1 | 1 |
102 | 120 | 80 | 0 | 0 | 2 |
109 | 115 | 75 | 0 | 0 | 2 |
94 | 135 | 100 | 0 | 0 | 2 |
97 | 100 | 70 | 0 | 0 | 3 |
85 | 120 | 80 | 0 | 0 | 3 |
88 | 115 | 75 | 1 | 0 | 3 |
93 | 125 | 85 | 1 | 0 | 3 |
78 | 130 | 90 | 1 | 1 | 4 |
115 | 140 | 110 | 1 | 1 | 4 |
Here, there are 3 Septic patients (P_ID = 1, 3, 4) and 2 Non-septic patients (P_ID = 0, 2). I want to plot this number as a bar plot. So, I manually did this using the following code:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(7, 6))
ax = fig.add_axes([0,0,1,1])
sepsis = ['Non-Septic patients', 'Septic patients']
count = [2, 3]
ax.bar(sepsis, count)
ax.set_title("Septic and Non-septic patient count in the dataset", y = 1, fontsize = 15)
ax.set_xlabel('Patients', fontsize = 12)
ax.set_ylabel('Count', fontsize = 12)
for bars in ax.containers:
ax.bar_label(bars)
ax.margins(y=0.1)
plt.show()
- However, I don't want to manually calculate the septic and non-septic patient count as the data I have is very large. This is just the dummy data. I know I must use P_ID column but not sure how.
- Second thing I want to plot is Out of these septic and non-septic patients, how many are Male (1) and Female (0) based on the Gender column. I want something like this graph:
****Update****
Using drop_duplicates
keeps only first row by default. So, the septic patient which has initially 0s
then it changes to 1
, there arise problem for them. Using the code only take first row even the patient is septic. Thus total number of septic patients drops, while number of non-septic patients increases, which shouldn't. Is it possible to keep only those rows in septic patients where 0
changes to 1
? So, all septic patients will have 1
in SepsisLabel in their first row instead of 0
. This will give the correct number of septic patients.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
1)使用
np.Where
。对于2),您可以使用seaborn
用于第二目的:output:
For 1) use
np.where
. For 2), you can useseaborn
for the second purpose:Output: