如何根据组是否包含特定列值来过滤 pandas 数据框?
我有以下数据:
df = pd.DataFrame({
'encounter' : [1, 1, 1, 2, 3, 3],
'project_id' : ['A','A','A','B','C','C'],
'datetime' : ['2017-01-18','2017-01-18','2017-01-18','2019-01-18','2020-01-18','2020-01-18'],
'diagnosis' : ['F12','A11','B11', 'C11', 'F12', 'B22']
})
每个 encounter 都是唯一的(并且具有相应的唯一project_id
和dateTime
),并表示诊断患者1的临床医生或更多诊断。我正在尝试找到所有包含特定诊断的组,例如F12 。
我不想只过滤 f12 ;我想组合遇到(+/- project_id
和dateTime
?),并为包含 f12 的组过滤,这样我也可以看到其他诊断与 f12 很常见。
我不确定如何解决这个问题 - 我尝试设置多索引/不同的组方法等。但是我什么都没到。对于上述数据,我所需的输出将与DF相同,不包括第3行如下:
索引 | 遇到 | Project_ID | DateTime | 诊断 |
---|---|---|---|---|
0 | 1 | A | 2017-01-18 | F12 |
1 | 1 | A 1 A | 2017-01-18 | A11 |
2 | 1 | A | 2017-01-18 | B11 |
4 | 3 | C | 2020-01-18 | F12 |
5 | 3 | C | 2020-01-18 | B22 |
I have the following data:
df = pd.DataFrame({
'encounter' : [1, 1, 1, 2, 3, 3],
'project_id' : ['A','A','A','B','C','C'],
'datetime' : ['2017-01-18','2017-01-18','2017-01-18','2019-01-18','2020-01-18','2020-01-18'],
'diagnosis' : ['F12','A11','B11', 'C11', 'F12', 'B22']
})
Each encounter is unique (and has a corresponding unique project_id
and datetime
) and denotes a clinician diagnosing a patient with 1 or more diagnoses. I'm trying to find all the groups that contain a particular diagnosis e.g. F12.
I don't want to just filter for F12; I want to groupby encounter (+/- project_id
and datetime
?) and filter for groups containing F12, so I can also see what other diagnoses are common with F12.
I'm unsure how to go about this - I've tried setting multi-indexes/different groupby approaches etc. but I'm not getting anywhere. For the above data, my desired output would be the same df excluding Row 3 as below:
Index | encounter | project_id | datetime | diagnosis |
---|---|---|---|---|
0 | 1 | A | 2017-01-18 | F12 |
1 | 1 | A | 2017-01-18 | A11 |
2 | 1 | A | 2017-01-18 | B11 |
4 | 3 | C | 2020-01-18 | F12 |
5 | 3 | C | 2020-01-18 | B22 |
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用 检查组中的任何值是否为F12并执行 boolean索引:
或者,或者,与
filter
:输出:
You can use
GroupBy.transform
to check if any value is F12 in the group and perform boolean indexing:Or, alternatively with
filter
:output:
过滤
诊断
f12
并获得匹配的相遇
值,然后再次过滤原始列 encounter byFilter
diagnosis
forF12
and get matchedencounter
values and then again filter original columnencounter
bySeries.isin
inboolean indexing
:我的解释是您需要所有与问题诊断“ F12”相关的遇到的遇到。看看这是否有效。
I interpreted as you need all encounters which is associated in any way to diagnosis 'F12' in the problem. See if this works.