计算平均Yoy百分比变化-pandas DataFrame

发布于 2025-01-30 15:31:11 字数 992 浏览 1 评论 0原文

我有一个带有每月观察结果的熊猫数据框架。我想计算几个指标 - 妈妈和Yoy PCT的更改。

import pandas as pd
import numpy as np

df = pd.DataFrame({
                   'c': ['A','A','A','B','B','B','C','C'],
                   'z': [1, 2, 3, 4, 5, 6, 7, 8],
                   '2018-01': [10, 12, 14, 16, 18, 20, 22, 24],
                   '2018-02': [12, 14, 16, 18, 20, 22, 24, 26],
                   '2019-01': [8, 10, 12, 14, 16, 18, 20, 22],
                   '2019-02': [10, 12, 14, 16, 18, 20, 22, 24]
                 })

对于c中的每个z,我想计算MOMyoy更改百分比。这将是pct在月份列的观察值和中的总百分比变化之间的不同。

我正在寻找可以在几个月的专栏和年度中推广的解决方案。

预期输出:

c  z  2018-01 2018-02 2019-01 2019-02 Avg_YoY_pct

A  1    10                              -18.18
A  2    12
A  3    14
B  4    .............................
B  5
B  6
C  7
C  8

avg_yoy_pct计算为百分比年度所有月度值的总和之间的差异。

I have a Pandas DataFrame with Monthly observations. I'd like to calculate a couple of metrics - MoM and YoY pct change.

import pandas as pd
import numpy as np

df = pd.DataFrame({
                   'c': ['A','A','A','B','B','B','C','C'],
                   'z': [1, 2, 3, 4, 5, 6, 7, 8],
                   '2018-01': [10, 12, 14, 16, 18, 20, 22, 24],
                   '2018-02': [12, 14, 16, 18, 20, 22, 24, 26],
                   '2019-01': [8, 10, 12, 14, 16, 18, 20, 22],
                   '2019-02': [10, 12, 14, 16, 18, 20, 22, 24]
                 })

For each z in c, I'd like to calculate the MoM and YoY change in percentage. This is would be pct different between observations in month column and aggregate percent change in year.

I am looking for a solution that is generalizable across several monthly columns and year.

Expected output:

c  z  2018-01 2018-02 2019-01 2019-02 Avg_YoY_pct

A  1    10                              -18.18
A  2    12
A  3    14
B  4    .............................
B  5
B  6
C  7
C  8

Avg_YoY_pct is calculated as percentage difference between sum of all monthly values of the year.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

还在原地等你 2025-02-06 15:31:11

感谢您很好地提供示例输入。这是一种首先将桌子融化为长形式的方法,然后终止一个集体比以每月的平均年龄,然后在所有月份获得平均年龄。我认为,更多的月和几年列

#melt the wide table into a long table
long_df = df.melt(
    id_vars=['c','z'],
    var_name='date',
    value_name='val',
)

#extract the year and month from the date column
long_df[['year','month']] = long_df['date'].str.split('-', expand=True)
long_df['year'] = long_df['year'].astype(int)
long_df['month'] = long_df['month'].astype(int)

#group by c/z/month and shift to get avg yoy for each month
avg_month_yoy = long_df.groupby(['c','z','month'])['val'].apply(
    lambda v: v.sub(v.shift(1)).div(v.shift(1)).multiply(100).mean()
).reset_index()

#group by just c/z to get avg yoy over all months
avg_yoy = avg_month_yoy.groupby(['c','z'])['val'].mean()

#Add the avg_yoy back into the original table
df = df.set_index(['c','z'])
df['Avg_YoY_pct'] = avg_yoy
df = df.reset_index()

print(df)

输出

Thanks for providing example input so nicely. Here's an approach that first melts the table into long form and then permforms a groupby to get average YoY for each month, and then another groupby to get average YoY over all months. I think it is flexible to more months and years columns

#melt the wide table into a long table
long_df = df.melt(
    id_vars=['c','z'],
    var_name='date',
    value_name='val',
)

#extract the year and month from the date column
long_df[['year','month']] = long_df['date'].str.split('-', expand=True)
long_df['year'] = long_df['year'].astype(int)
long_df['month'] = long_df['month'].astype(int)

#group by c/z/month and shift to get avg yoy for each month
avg_month_yoy = long_df.groupby(['c','z','month'])['val'].apply(
    lambda v: v.sub(v.shift(1)).div(v.shift(1)).multiply(100).mean()
).reset_index()

#group by just c/z to get avg yoy over all months
avg_yoy = avg_month_yoy.groupby(['c','z'])['val'].mean()

#Add the avg_yoy back into the original table
df = df.set_index(['c','z'])
df['Avg_YoY_pct'] = avg_yoy
df = df.reset_index()

print(df)

Output

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文