按日期组和列除以python中的ID变量
数据框
data = {'Result1':[10, 11, 12, 44],
'Result2':[6, 7, 55, 114],
'ID':['A', 'B', 'A', 'B'],
'anoter_id':["one", "one", "two", "two"],
'date' : ['2022-06-06', '2022-06-06', '2022-06-01', '2022-06-01']
}
df = pd.DataFrame(data)
我有这样的 的 id == b 比结果1 和
result 2 of id == a`。
我还想拥有一个具有此差异的列,例如“ result1_b/result1_a”
and “ result2_b/result2_a”
我的最初想法是这样做:此外
df.query("ID == 'A'").sort_values('date').Result1/df.query("ID == 'B'").sort_values('date').Result1
:我必须在数据集中的每一列上写入。
有什么想法吗?
编辑: 输出的一个示例如下。 ON result1_b/result1_a_perdate
结果来自11/10和7/6,这对应于ID b的结果1 result 1 aflest 1 for iD a for anoter_id
'一个'。列Inside_10%_limit
如果两个列resuly1_b/result1_a_perdate
和result> result2_b/result2_a_perdate
最多10%
output = {'Result1_B/Result1_A_perDate':[1.1, 1.16],
'Result2_B/Result2_A_perDate':[3.6, 2.0],
'anoter_id':["one", "two"],
'date' : ['2022-06-06', '2022-06-01'],
'inside_10%_limit': ['True', 'False']
}
pd.DataFrame(output)
I have the a data frame like this:
data = {'Result1':[10, 11, 12, 44],
'Result2':[6, 7, 55, 114],
'ID':['A', 'B', 'A', 'B'],
'anoter_id':["one", "one", "two", "two"],
'date' : ['2022-06-06', '2022-06-06', '2022-06-01', '2022-06-01']
}
df = pd.DataFrame(data)
And I want to group by date
and anoter_id
and keep the lines where Result1
and Result2
of id==B
is 10% bigger or lower than Result1and
Result2of
id==A`.
I also want to have a column with this difference, like "Result1_B/Result1_A"
and "Result2_B/Result2_A"
My initial idea was to do this:
df.query("ID == 'A'").sort_values('date').Result1/df.query("ID == 'B'").sort_values('date').Result1
Moreover is super verbose and I have to write on line for each column on my dataset.
Any ideas?
EDIT:
An example of output is below.
On Result1_B/Result1_A_perDate
the results come from 11/10 and 7/6, that corresponds to Result1 of id B divided by Result1 of id A for anoter_id
'one' . The column inside_10%_limit
is TRUE if both columns Result1_B/Result1_A_perDate
and Result2_B/Result2_A_perDate
are less or more than maximum 10%
output = {'Result1_B/Result1_A_perDate':[1.1, 1.16],
'Result2_B/Result2_A_perDate':[3.6, 2.0],
'anoter_id':["one", "two"],
'date' : ['2022-06-06', '2022-06-01'],
'inside_10%_limit': ['True', 'False']
}
pd.DataFrame(output)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
逻辑仍然不清楚,但这是我的假设:
输出:
The logic is still not fully clear, but here is my assumption:
output: