pandas 中复杂的部分字符串匹配
给定具有以下结构和值的数据框 json_path
-
json_path | 报告组 | 实体/分组 |
---|---|---|
data.attributes.total.children.[0] | 基督教家庭 | 亚伯拉罕家庭 |
data.attributes.total.children.[0].children.[0] | 基督教家庭 | 庄园 |
data.attributes.total.children.[0].children.[0].children.[0].children.[0] | 基督教家庭 | 现金 |
data.attributes.total.children.[0].children.[0].children.[1].children.[0] | 基督教家庭 | 投资级固定收益 |
我如何过滤包含四次children
的json_path
行?即,我想过滤索引位置 2-3 -
json_path | 报告组 | 实体/分组 |
---|---|---|
data.attributes.total.children。[0 ].children.[0].children.[0].children.[0] | 基督教家庭 | 现金 |
data.attributes.total.children.[0].children.[0].children.[1].children.[0] | 基督教家庭 | 投资级固定收益 |
我知道如何获得部分匹配,但是方括号中的整数会不一致,所以我的直觉告诉我以某种方式拥有计算 children
实例的逻辑(即,children
出现 4x)并以此为基础进行过滤。
关于如何实现这一目标有什么建议或资源吗?
Given a dataframe with the following structure and values json_path
-
json_path | Reporting Group | Entity/Grouping |
---|---|---|
data.attributes.total.children.[0] | Christian Family | Abraham Family |
data.attributes.total.children.[0].children.[0] | Christian Family | In Estate |
data.attributes.total.children.[0].children.[0].children.[0].children.[0] | Christian Family | Cash |
data.attributes.total.children.[0].children.[0].children.[1].children.[0] | Christian Family | Investment Grade Fixed Income |
How would I filter on the json_path
rows which containchildren
four times? i.e., I want to filter on index position 2-3 -
json_path | Reporting Group | Entity/Grouping |
---|---|---|
data.attributes.total.children.[0].children.[0].children.[0].children.[0] | Christian Family | Cash |
data.attributes.total.children.[0].children.[0].children.[1].children.[0] | Christian Family | Investment Grade Fixed Income |
I know how to obtain a partial match, however the integers in the square brackets will be inconsistent, so my instinct is telling me to somehow have logic that counts the instances of children
(i.e., children
appearing 4x) and using that as a basis to filter.
Any suggestions or resources on how I can achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如您所说,一种简单的方法是计算
.children
的出现次数,并将计数与 4 进行比较,以创建可用于过滤行的布尔掩码。更可靠的方法是检查连续出现4个孩子
As you said, a naive approach would be to count the occurrence of
.children
and compare the count with 4 to create boolean mask which can be used to filter the rowsA more robust approach would be to check for the consecutive occurrence of 4 children