按日期组和列除以python中的ID变量

发布于 2025-02-10 03:22:01 字数 1200 浏览 1 评论 0原文

数据框

data = {'Result1':[10, 11, 12, 44],
'Result2':[6, 7, 55, 114],
    'ID':['A', 'B', 'A', 'B'],
    'anoter_id':["one", "one", "two", "two"],
    'date' : ['2022-06-06', '2022-06-06', '2022-06-01', '2022-06-01']
    }

df = pd.DataFrame(data)

我有这样的 的 id == b 比结果1 result 2 of id == a`。

我还想拥有一个具有此差异的列,例如“ result1_b/result1_a” and “ result2_b/result2_a”


我的最初想法是这样做:此外

df.query("ID == 'A'").sort_values('date').Result1/df.query("ID == 'B'").sort_values('date').Result1

:我必须在数据集中的每一列上写入。

有什么想法吗?

编辑: 输出的一个示例如下。 ON result1_b/result1_a_perdate结果来自11/10和7/6,这对应于ID b的结果1 result 1 aflest 1 for iD a for anoter_id'一个'。列Inside_10%_limit如果两个列resuly1_b/result1_a_perdateresult> result2_b/result2_a_perdate最多10%

output = {'Result1_B/Result1_A_perDate':[1.1, 1.16],
    'Result2_B/Result2_A_perDate':[3.6, 2.0],
    'anoter_id':["one", "two"],
    'date' : ['2022-06-06', '2022-06-01'],
      'inside_10%_limit': ['True', 'False']
    }

pd.DataFrame(output)

I have the a data frame like this:

data = {'Result1':[10, 11, 12, 44],
'Result2':[6, 7, 55, 114],
    'ID':['A', 'B', 'A', 'B'],
    'anoter_id':["one", "one", "two", "two"],
    'date' : ['2022-06-06', '2022-06-06', '2022-06-01', '2022-06-01']
    }

df = pd.DataFrame(data)

And I want to group by date and anoter_id and keep the lines where Result1 and Result2 of id==B is 10% bigger or lower than Result1andResult2ofid==A`.

I also want to have a column with this difference, like "Result1_B/Result1_A" and "Result2_B/Result2_A"


My initial idea was to do this:

df.query("ID == 'A'").sort_values('date').Result1/df.query("ID == 'B'").sort_values('date').Result1

Moreover is super verbose and I have to write on line for each column on my dataset.

Any ideas?

EDIT:
An example of output is below.
On Result1_B/Result1_A_perDate the results come from 11/10 and 7/6, that corresponds to Result1 of id B divided by Result1 of id A for anoter_id 'one' . The column inside_10%_limit is TRUE if both columns Result1_B/Result1_A_perDate and Result2_B/Result2_A_perDate are less or more than maximum 10%

output = {'Result1_B/Result1_A_perDate':[1.1, 1.16],
    'Result2_B/Result2_A_perDate':[3.6, 2.0],
    'anoter_id':["one", "two"],
    'date' : ['2022-06-06', '2022-06-01'],
      'inside_10%_limit': ['True', 'False']
    }

pd.DataFrame(output)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

您的好友蓝忘机已上羡 2025-02-17 03:22:01

逻辑仍然不清楚,但这是我的假设:

# pivot DataFrame
df2 = df.pivot(index=['anoter_id', 'date'], columns='ID')

# Use MultiIndex selection to divide B by A
out = df2.xs('B', level=1, axis=1).div(df2.xs('A', level=1, axis=1)).add_suffix('_B/A')

# get top  10% percentile
thresh = out.stack().quantile(0.9, interpolation='nearest')
# 3.6666

# identify rows with both values above or below threshold
out['10%lim'] = out.gt(thresh).all(1) | out.lt(thresh).all(1)

# index as columns
out = out.reset_index()

输出:

  anoter_id        date  Result1_B/A  Result2_B/A  10%lim
0       one  2022-06-06     1.100000     1.166667    True
1       two  2022-06-01     3.666667     2.072727   False

The logic is still not fully clear, but here is my assumption:

# pivot DataFrame
df2 = df.pivot(index=['anoter_id', 'date'], columns='ID')

# Use MultiIndex selection to divide B by A
out = df2.xs('B', level=1, axis=1).div(df2.xs('A', level=1, axis=1)).add_suffix('_B/A')

# get top  10% percentile
thresh = out.stack().quantile(0.9, interpolation='nearest')
# 3.6666

# identify rows with both values above or below threshold
out['10%lim'] = out.gt(thresh).all(1) | out.lt(thresh).all(1)

# index as columns
out = out.reset_index()

output:

  anoter_id        date  Result1_B/A  Result2_B/A  10%lim
0       one  2022-06-06     1.100000     1.166667    True
1       two  2022-06-01     3.666667     2.072727   False
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文