按日期组和列除以python中的ID变量

发布于 2025-02-10 03:22:01 字数 1200 浏览 1 评论 0原文

数据框

data = {'Result1':[10, 11, 12, 44],
'Result2':[6, 7, 55, 114],
    'ID':['A', 'B', 'A', 'B'],
    'anoter_id':["one", "one", "two", "two"],
    'date' : ['2022-06-06', '2022-06-06', '2022-06-01', '2022-06-01']
    }

df = pd.DataFrame(data)

我有这样的的 id == b 比结果1 和 result 2 of id == a`。

我还想拥有一个具有此差异的列，例如“ result1_b/result1_a” and “ result2_b/result2_a”

我的最初想法是这样做：此外

df.query("ID == 'A'").sort_values('date').Result1/df.query("ID == 'B'").sort_values('date').Result1

：我必须在数据集中的每一列上写入。

有什么想法吗？

编辑：输出的一个示例如下。 ON result1_b/result1_a_perdate结果来自11/10和7/6，这对应于ID b的结果1 result 1 aflest 1 for iD a for anoter_id'一个'。列Inside_10％_limit如果两个列resuly1_b/result1_a_perdate和result> result2_b/result2_a_perdate最多10％

output = {'Result1_B/Result1_A_perDate':[1.1, 1.16],
    'Result2_B/Result2_A_perDate':[3.6, 2.0],
    'anoter_id':["one", "two"],
    'date' : ['2022-06-06', '2022-06-01'],
      'inside_10%_limit': ['True', 'False']
    }

pd.DataFrame(output)

原文

I have the a data frame like this:

data = {'Result1':[10, 11, 12, 44],
'Result2':[6, 7, 55, 114],
    'ID':['A', 'B', 'A', 'B'],
    'anoter_id':["one", "one", "two", "two"],
    'date' : ['2022-06-06', '2022-06-06', '2022-06-01', '2022-06-01']
    }

df = pd.DataFrame(data)

And I want to group by date and anoter_id and keep the lines where Result1 and Result2 of id==B is 10% bigger or lower than Result1andResult2ofid==A`.

I also want to have a column with this difference, like "Result1_B/Result1_A" and "Result2_B/Result2_A"

My initial idea was to do this:

df.query("ID == 'A'").sort_values('date').Result1/df.query("ID == 'B'").sort_values('date').Result1

Moreover is super verbose and I have to write on line for each column on my dataset.

Any ideas?

EDIT:
An example of output is below.
On Result1_B/Result1_A_perDate the results come from 11/10 and 7/6, that corresponds to Result1 of id B divided by Result1 of id A for anoter_id 'one' . The column inside_10%_limit is TRUE if both columns Result1_B/Result1_A_perDate and Result2_B/Result2_A_perDate are less or more than maximum 10%

output = {'Result1_B/Result1_A_perDate':[1.1, 1.16],
    'Result2_B/Result2_A_perDate':[3.6, 2.0],
    'anoter_id':["one", "two"],
    'date' : ['2022-06-06', '2022-06-01'],
      'inside_10%_limit': ['True', 'False']
    }

pd.DataFrame(output)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

您的好友蓝忘机已上羡 2025-02-17 03:22:01

逻辑仍然不清楚，但这是我的假设：

# pivot DataFrame
df2 = df.pivot(index=['anoter_id', 'date'], columns='ID')

# Use MultiIndex selection to divide B by A
out = df2.xs('B', level=1, axis=1).div(df2.xs('A', level=1, axis=1)).add_suffix('_B/A')

# get top  10% percentile
thresh = out.stack().quantile(0.9, interpolation='nearest')
# 3.6666

# identify rows with both values above or below threshold
out['10%lim'] = out.gt(thresh).all(1) | out.lt(thresh).all(1)

# index as columns
out = out.reset_index()

输出：

  anoter_id        date  Result1_B/A  Result2_B/A  10%lim
0       one  2022-06-06     1.100000     1.166667    True
1       two  2022-06-01     3.666667     2.072727   False

The logic is still not fully clear, but here is my assumption:

# pivot DataFrame
df2 = df.pivot(index=['anoter_id', 'date'], columns='ID')

# Use MultiIndex selection to divide B by A
out = df2.xs('B', level=1, axis=1).div(df2.xs('A', level=1, axis=1)).add_suffix('_B/A')

# get top  10% percentile
thresh = out.stack().quantile(0.9, interpolation='nearest')
# 3.6666

# identify rows with both values above or below threshold
out['10%lim'] = out.gt(thresh).all(1) | out.lt(thresh).all(1)

# index as columns
out = out.reset_index()

output:

  anoter_id        date  Result1_B/A  Result2_B/A  10%lim
0       one  2022-06-06     1.100000     1.166667    True
1       two  2022-06-01     3.666667     2.072727   False

回复收藏 0 原文

~没有更多了~