如何根据组是否包含特定列值来过滤 pandas 数据框?

发布于 2025-01-21 02:13:47 字数 1297 浏览 0 评论 0原文

我有以下数据:

df = pd.DataFrame({
    'encounter' : [1, 1, 1, 2, 3, 3],
    'project_id' : ['A','A','A','B','C','C'],
    'datetime' : ['2017-01-18','2017-01-18','2017-01-18','2019-01-18','2020-01-18','2020-01-18'],
    'diagnosis' : ['F12','A11','B11', 'C11', 'F12', 'B22']
})

每个 encounter 都是唯一的(并且具有相应的唯一project_iddateTime),并表示诊断患者1的临床医生或更多诊断。我正在尝试找到所有包含特定诊断的组,例如F12 。

我不想只过滤 f12 ;我想组合遇到(+/- project_iddateTime?),并为包含 f12 的组过滤,这样我也可以看到其他诊断与 f12 很常见。

我不确定如何解决这个问题 - 我尝试设置多索引/不同的组方法等。但是我什么都没到。对于上述数据,我所需的输出将与DF相同,不包括第3行如下:

索引遇到Project_IDDateTime诊断
01A2017-01-18F12
11A 1 A2017-01-18A11
21A2017-01-18B11
43C2020-01-18F12
53C2020-01-18B22

I have the following data:

df = pd.DataFrame({
    'encounter' : [1, 1, 1, 2, 3, 3],
    'project_id' : ['A','A','A','B','C','C'],
    'datetime' : ['2017-01-18','2017-01-18','2017-01-18','2019-01-18','2020-01-18','2020-01-18'],
    'diagnosis' : ['F12','A11','B11', 'C11', 'F12', 'B22']
})

Each encounter is unique (and has a corresponding unique project_id and datetime) and denotes a clinician diagnosing a patient with 1 or more diagnoses. I'm trying to find all the groups that contain a particular diagnosis e.g. F12.

I don't want to just filter for F12; I want to groupby encounter (+/- project_id and datetime?) and filter for groups containing F12, so I can also see what other diagnoses are common with F12.

I'm unsure how to go about this - I've tried setting multi-indexes/different groupby approaches etc. but I'm not getting anywhere. For the above data, my desired output would be the same df excluding Row 3 as below:

Indexencounterproject_iddatetimediagnosis
01A2017-01-18F12
11A2017-01-18A11
21A2017-01-18B11
43C2020-01-18F12
53C2020-01-18B22

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

吻泪 2025-01-28 02:13:48

您可以使用 检查组中的任何值是否为F12并执行 boolean索引

df[df['diagnosis'].eq('F12').groupby(df['encounter']).transform('any')]

或者,或者,与 filter

df.groupby('encounter').filter(lambda d: d['diagnosis'].eq('F12').any())

输出:

   encounter project_id    datetime diagnosis
0          1          A  2017-01-18       F12
1          1          A  2017-01-18       A11
2          1          A  2017-01-18       B11
4          3          C  2020-01-18       F12
5          3          C  2020-01-18       B22

You can use GroupBy.transform to check if any value is F12 in the group and perform boolean indexing:

df[df['diagnosis'].eq('F12').groupby(df['encounter']).transform('any')]

Or, alternatively with filter:

df.groupby('encounter').filter(lambda d: d['diagnosis'].eq('F12').any())

output:

   encounter project_id    datetime diagnosis
0          1          A  2017-01-18       F12
1          1          A  2017-01-18       A11
2          1          A  2017-01-18       B11
4          3          C  2020-01-18       F12
5          3          C  2020-01-18       B22

心房的律动 2025-01-28 02:13:48

过滤诊断 f12并获得匹配的相遇值,然后再次过滤原始列 encounter by

df = df[df['encounter'].isin(df.loc[df['diagnosis'].eq('F12'), 'encounter'])]
print (df)
   encounter project_id    datetime diagnosis
0          1          A  2017-01-18       F12
1          1          A  2017-01-18       A11
2          1          A  2017-01-18       B11
4          3          C  2020-01-18       F12
5          3          C  2020-01-18       B22

Filter diagnosis for F12 and get matched encounter values and then again filter original column encounter by Series.isin in boolean indexing:

df = df[df['encounter'].isin(df.loc[df['diagnosis'].eq('F12'), 'encounter'])]
print (df)
   encounter project_id    datetime diagnosis
0          1          A  2017-01-18       F12
1          1          A  2017-01-18       A11
2          1          A  2017-01-18       B11
4          3          C  2020-01-18       F12
5          3          C  2020-01-18       B22
腻橙味 2025-01-28 02:13:48

我的解释是您需要所有与问题诊断“ F12”相关的遇到的遇到。看看这是否有效。

df[df['project_id'].isin(df.loc[df['diagnosis'].eq('F12'), 'project_id']) | df['datetime'].isin(df.loc[df['diagnosis'].eq('F12'), 'datetime'])] 

I interpreted as you need all encounters which is associated in any way to diagnosis 'F12' in the problem. See if this works.

df[df['project_id'].isin(df.loc[df['diagnosis'].eq('F12'), 'project_id']) | df['datetime'].isin(df.loc[df['diagnosis'].eq('F12'), 'datetime'])] 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文