Python - 将周末值推至周一

发布于 2025-01-16 01:08:41 字数 667 浏览 1 评论 0原文

我有一个数据框(称为 df),如下所示:

df

我正在尝试获取所有周末的“音量”值(列“WEEKDAY”=5(星期六)或 6(星期日))并将它们汇总到随后的星期一(WEEKDAY=​​0)。

我尝试了一些方法,但没有真正起作用,以最后三行为例:

我拥有的

我是什么期望是这样的:

我期望的

要重现该问题:

!wget https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv

df = pd.read_csv('volume_por_dia.csv').sort_values('Datas',ascending=True)
df['Datas'] = pd.to_datetime(df['Datas'])
df = df_volume_noticias.set_index('Datas')
df['WEEKDAY'] = df.index.dayofweek
df

I have a dataframe (called df) that looks like this:

df

I'm trying to take all weekend 'Volume' values (the ones where column 'WEEKDAY'=5 (saturday) or 6(sunday)) and sum them to the subsequent monday(WEEKDAY=0).

I tried a few things but nothing really worked, taking an example from the last three rows:

What I have

What I'm expecting is this:

What I expect

To reproduce the problem:

!wget https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv

df = pd.read_csv('volume_por_dia.csv').sort_values('Datas',ascending=True)
df['Datas'] = pd.to_datetime(df['Datas'])
df = df_volume_noticias.set_index('Datas')
df['WEEKDAY'] = df.index.dayofweek
df

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

银河中√捞星星 2025-01-23 01:08:41

在这里添加 2 个解决方案:

  1. 使用 pd.shift(Lukas Hestermeyer 之前指出;我添加了一个简化版本)

  2. 使用滚动窗口(这实际上是一行)

两种解决方案都假设:

  1. 日期 按升序排序(如果没有,则应先排序后再继续)
  2. 每个周末(周六和周日)记录后面都会有周一记录。如果数据丢失,需要添加额外的检查

第 1 部分 |数据准备:

import pandas as pd
import numpy as np

# STEP 1: Create DF
Datas = [
    '2019-07-02',
    '2019-07-03',
    '2019-07-04',
    '2019-07-05',
    '2019-07-06',
    '2019-07-07',
    '2019-07-08',
    '2022-03-10',
    '2022-03-11',
    '2022-03-12',
    '2022-03-13',
    '2022-03-14'
]

Volume = [17, 30, 20, 21, 5, 10, 12, 24, 18, 4, 1, 5]
WEEKDAY = [1, 2, 3, 4, 5, 6, 0, 3, 4, 5, 6, 0]

dic = {'Datas': Datas, 'Volume': Volume, 'WEEKDAY': WEEKDAY}

df['Datas'] = pd.to_datetime(df['Datas'])
df = df.set_index('Datas')

df = pd.DataFrame(dic)

第 2 部分 |解决方案:

解决方案 1 [pd.shift]:

# STEP 1: add shifts
df['shift_1'] = df['Volume'].shift(1)
df['shift_2'] = df['shift_1'].shift(1)

# STEP 2: sum Volume with shifts where weekday==0
cols_to_sum = ['Volume', 'shift_1', 'shift_2']
df['Volume'] = df[['WEEKDAY'] + cols_to_sum].apply(lambda x: int(x[1]) if x[0] else int(x[1] + x[2] + x[3]), axis=1)
df = df.drop(['shift_1', 'shift_2'], axis=1)
df

解决方案 2 [滚动窗口]:

# use rolling window of size 3 to sum where weekday == 0
df['Volume'] = np.where(
    df['WEEKDAY'] == 0,  
    df['Volume'].rolling(window=3, center=False).sum(), 
    df['Volume']
)
df

第 3 部分 |删除周末记录:

df = df.loc[~df['WEEKDAY'].isin([5, 6])]
df

Adding 2 solutions here:

  1. Using pd.shift (pointed out earlier by Lukas Hestermeyer; I've added a simplified version)

  2. Using rolling window (this is literally a one-liner)

Both solutions assume that;

  1. Dates are sorted in ascending order (if not, sorting should be done before proceeding)
  2. Every weekend(saturday and sunday) record is succeeded by a monday record. Additional checks would need to be added in case of mising data

Part 1 | Data Prep:

import pandas as pd
import numpy as np

# STEP 1: Create DF
Datas = [
    '2019-07-02',
    '2019-07-03',
    '2019-07-04',
    '2019-07-05',
    '2019-07-06',
    '2019-07-07',
    '2019-07-08',
    '2022-03-10',
    '2022-03-11',
    '2022-03-12',
    '2022-03-13',
    '2022-03-14'
]

Volume = [17, 30, 20, 21, 5, 10, 12, 24, 18, 4, 1, 5]
WEEKDAY = [1, 2, 3, 4, 5, 6, 0, 3, 4, 5, 6, 0]

dic = {'Datas': Datas, 'Volume': Volume, 'WEEKDAY': WEEKDAY}

df['Datas'] = pd.to_datetime(df['Datas'])
df = df.set_index('Datas')

df = pd.DataFrame(dic)

Part 2 | Solutions:

Solution 1 [pd.shift] :

# STEP 1: add shifts
df['shift_1'] = df['Volume'].shift(1)
df['shift_2'] = df['shift_1'].shift(1)

# STEP 2: sum Volume with shifts where weekday==0
cols_to_sum = ['Volume', 'shift_1', 'shift_2']
df['Volume'] = df[['WEEKDAY'] + cols_to_sum].apply(lambda x: int(x[1]) if x[0] else int(x[1] + x[2] + x[3]), axis=1)
df = df.drop(['shift_1', 'shift_2'], axis=1)
df

Solution 2 [rolling window] :

# use rolling window of size 3 to sum where weekday == 0
df['Volume'] = np.where(
    df['WEEKDAY'] == 0,  
    df['Volume'].rolling(window=3, center=False).sum(), 
    df['Volume']
)
df

Part 3 | Removing weekend records :

df = df.loc[~df['WEEKDAY'].isin([5, 6])]
df
悲欢浪云 2025-01-23 01:08:41

这可以使用 pd.shift 解决您的问题。

import pandas as pd
df['prior_volume'] = df.Volume.shift(1)
df['prior_volume2'] = df.Volume.shift(2)
df.loc[df['WEEKDAY'] == 0, 'Volume'] = df.loc[df['WEEKDAY'] == 0, 'prior_volume'] + \
    df.loc[df['WEEKDAY'] == 0, 'prior_volume2'] + \
    df.loc[df['WEEKDAY'] == 0, 'Volume']
df = df[df['WEEKDAY'].isin(range(5))]
df = df[['Volume', 'WEEKDAY']]
df.head(10)

产生:
输入图片此处描述

This solves your question using pd.shift.

import pandas as pd
df['prior_volume'] = df.Volume.shift(1)
df['prior_volume2'] = df.Volume.shift(2)
df.loc[df['WEEKDAY'] == 0, 'Volume'] = df.loc[df['WEEKDAY'] == 0, 'prior_volume'] + \
    df.loc[df['WEEKDAY'] == 0, 'prior_volume2'] + \
    df.loc[df['WEEKDAY'] == 0, 'Volume']
df = df[df['WEEKDAY'].isin(range(5))]
df = df[['Volume', 'WEEKDAY']]
df.head(10)

which yields:
enter image description here

月光色 2025-01-23 01:08:41

我使用 .groupby 来解决问题。

import pandas as pd

df = pd.read_csv('volume_por_dia.csv')
df['Datas'] = pd.to_datetime(df['Datas'])
df['WEEKDAY'] = df['Datas'].dt.dayofweek
df['index'] = df['Datas']

# Group df by date, setting frequency as week 
#(beginning Tue - so that Sat and Sun will be added to the next Mon)
df_group = df.groupby([pd.Grouper(key = 'Datas', freq='W-MON'), \
 'WEEKDAY', 'index']).agg({'Volume': 'sum'})

# In each group, add days 5, 6 (Sat and Sun) to day 0 (Mon)
df_group.loc[(slice(None), 0), 'Volume'] += \
df_group.loc[(slice(None), [5, 6]), 'Volume'].groupby(level=0).sum()

# In the grouped data, remove Sat and Sun
df_group = df_group.reset_index()
df_group = df_group[df_group['WEEKDAY'] != 5]
df_group = df_group[df_group['WEEKDAY'] != 6]

# Remove volume data from original df, and merge with volume from df_group 
df = df.drop(['Volume'], axis=1)
df = pd.merge(df,df_group[['index','Volume']],on='index', how='left')
df = df.dropna(subset=['Volume'])
df = df.drop(['index'], axis=1)

# Optional: sort dates in ascending order
df = df.sort_values(by=['Datas'])

print (df)

I used .groupby to solve the problem.

import pandas as pd

df = pd.read_csv('volume_por_dia.csv')
df['Datas'] = pd.to_datetime(df['Datas'])
df['WEEKDAY'] = df['Datas'].dt.dayofweek
df['index'] = df['Datas']

# Group df by date, setting frequency as week 
#(beginning Tue - so that Sat and Sun will be added to the next Mon)
df_group = df.groupby([pd.Grouper(key = 'Datas', freq='W-MON'), \
 'WEEKDAY', 'index']).agg({'Volume': 'sum'})

# In each group, add days 5, 6 (Sat and Sun) to day 0 (Mon)
df_group.loc[(slice(None), 0), 'Volume'] += \
df_group.loc[(slice(None), [5, 6]), 'Volume'].groupby(level=0).sum()

# In the grouped data, remove Sat and Sun
df_group = df_group.reset_index()
df_group = df_group[df_group['WEEKDAY'] != 5]
df_group = df_group[df_group['WEEKDAY'] != 6]

# Remove volume data from original df, and merge with volume from df_group 
df = df.drop(['Volume'], axis=1)
df = pd.merge(df,df_group[['index','Volume']],on='index', how='left')
df = df.dropna(subset=['Volume'])
df = df.drop(['index'], axis=1)

# Optional: sort dates in ascending order
df = df.sort_values(by=['Datas'])

print (df)
三五鸿雁 2025-01-23 01:08:41

您可以简单地循环行并从周五开始累积交易量,并更新周日交易量中的值。然后,删除周五和周六的行。

values = df.values

volume_accumulated = 0
for idx, row in enumerate(values):
  if row[1] in (5, 6):
    volume_accumulated += row[0]
  elif row[1] == 0:
    volume_accumulated += row[0]
    df["Volume"][idx] = volume_accumulated
  else:
    volume_accumulated = 0

df = df[~df["WEEKDAY"].isin([5, 6])]

You can simply loop over the rows and keep accumulating volumes from Friday, and update the value in the volume of Sunday. Then, just drop the rows of Friday and Saturday.

values = df.values

volume_accumulated = 0
for idx, row in enumerate(values):
  if row[1] in (5, 6):
    volume_accumulated += row[0]
  elif row[1] == 0:
    volume_accumulated += row[0]
    df["Volume"][idx] = volume_accumulated
  else:
    volume_accumulated = 0

df = df[~df["WEEKDAY"].isin([5, 6])]
笑着哭最痛 2025-01-23 01:08:41

输入:

!wget https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv

import pandas as pd
import numpy as np

df = pd.read_csv('volume_por_dia.csv').sort_values('Datas',ascending=True)
df['Datas'] = pd.to_datetime(df['Datas'])
df.set_index('Datas', inplace=True)
df['WEEKDAY'] = df.index.dayofweek

我假设索引日期已排序,Datas 索引是唯一的并且没有丢失的日期。相反,我无法做出一些假设:

  • 对于每个星期一,我都有上周末完整的交易量,这可能是错误的,因为数据框可能会在周日开始,并且我会有不完整的周末交易量;
  • 对于每个周末,我都会有一个下周一,这可能是错误的,因为数据框可能会在周六或周日完成。

由于这些原因,在计算周末交易量之前,我首先提取第一个周六和上周一的日期:

first_saturday = df.index[df.WEEKDAY==5][0]
last_monday = df.index[df.WEEKDAY==0][-1]

现在我可以提取周末交易量,确保我始终拥有周六和周日的一对,并且对于这些一对中的每一个,下一个周一存在于数据框中:

df_weekend = df.loc[
    (df.WEEKDAY.isin([5,6]))&
    (df.index<=last_monday)&
    (df.index>=first_saturday)
]
df_weekend

现在,由于我有几个周六周日的交易量,我可以通过以下方式计算总和:

weekend_volumes = pd.Series(
    df_weekend.Volume.values.reshape(-1,2).sum(axis=1), #sum of volume couples
    index = df_weekend.index[1::2]+pd.Timedelta("1d"), #date of the following monday
    name="weekend_volume"
).reindex(df.index).fillna(0) #zero weekend-volume for days that are not mondays 
weekend_volumes

最后将周末交易量添加到起始交易量中:

df["Volume"] = df.Volume+weekend_volumes

我将附加最后 25 行df 下面:

# 2022-02-18    16.0    4
# 2022-02-19    2.0     5
# 2022-02-20    1.0     6
# 2022-02-21    10.0    0
# 2022-02-22    43.0    1
# 2022-02-23    36.0    2
# 2022-02-24    38.0    3
# 2022-02-25    28.0    4
# 2022-02-26    5.0     5
# 2022-02-27    3.0     6
# 2022-02-28    14.0    0
# 2022-03-01    10.0    1
# 2022-03-02    16.0    2
# 2022-03-03    18.0    3
# 2022-03-04    11.0    4
# 2022-03-05    8.0     5
# 2022-03-06    2.0     6
# 2022-03-07    32.0    0
# 2022-03-08    18.0    1
# 2022-03-09    32.0    2
# 2022-03-10    24.0    3
# 2022-03-11    18.0    4
# 2022-03-12    4.0     5
# 2022-03-13    1.0     6
# 2022-03-14    10.0    0

Input:

!wget https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv

import pandas as pd
import numpy as np

df = pd.read_csv('volume_por_dia.csv').sort_values('Datas',ascending=True)
df['Datas'] = pd.to_datetime(df['Datas'])
df.set_index('Datas', inplace=True)
df['WEEKDAY'] = df.index.dayofweek

I assume that the index dates are sorted, the Datas index is unique and that there are no missing days. Some assumptions I can't make are instead:

  • for each monday I have volumes from the complete previous weekend, that could be wrong because the dataframe may start on sunday and I would have an incomplete weekend volume;
  • for each weekend, I would have a following monday, that could be wrong because the dataframe may finish on saturday or sunday.

For these reasons, before computing weekend volumes, I first extract the dates of the first Saturday and last Monday:

first_saturday = df.index[df.WEEKDAY==5][0]
last_monday = df.index[df.WEEKDAY==0][-1]

Now I can extract weekend volumes being sure that I always have the saturday-sunday couple and that for each of these couple, a following monday exists in the dataframe:

df_weekend = df.loc[
    (df.WEEKDAY.isin([5,6]))&
    (df.index<=last_monday)&
    (df.index>=first_saturday)
]
df_weekend

Now, since I have couples of saturday-sunday volumes, I can compute the sums in the following way:

weekend_volumes = pd.Series(
    df_weekend.Volume.values.reshape(-1,2).sum(axis=1), #sum of volume couples
    index = df_weekend.index[1::2]+pd.Timedelta("1d"), #date of the following monday
    name="weekend_volume"
).reindex(df.index).fillna(0) #zero weekend-volume for days that are not mondays 
weekend_volumes

Finally add the weekend-volumes to the starting volumes:

df["Volume"] = df.Volume+weekend_volumes

I am attaching the last 25 lines of the df below:

# 2022-02-18    16.0    4
# 2022-02-19    2.0     5
# 2022-02-20    1.0     6
# 2022-02-21    10.0    0
# 2022-02-22    43.0    1
# 2022-02-23    36.0    2
# 2022-02-24    38.0    3
# 2022-02-25    28.0    4
# 2022-02-26    5.0     5
# 2022-02-27    3.0     6
# 2022-02-28    14.0    0
# 2022-03-01    10.0    1
# 2022-03-02    16.0    2
# 2022-03-03    18.0    3
# 2022-03-04    11.0    4
# 2022-03-05    8.0     5
# 2022-03-06    2.0     6
# 2022-03-07    32.0    0
# 2022-03-08    18.0    1
# 2022-03-09    32.0    2
# 2022-03-10    24.0    3
# 2022-03-11    18.0    4
# 2022-03-12    4.0     5
# 2022-03-13    1.0     6
# 2022-03-14    10.0    0
旧故 2025-01-23 01:08:41

例如,如果考虑周从星期二开始,问题就会变得更简单。您只需获取周末的值并将其与该周的星期一相加(这将是周末后的星期一)。这将自动处理数据可能在周末或不在周末开始/结束的情况。

import numpy as np
import pandas as pd
np.random.seed(1)

# Sample data
dates = pd.date_range('2018-02-05', '2018-07-22', freq='D')
volume = np.random.randint(1, 50, len(dates))
df = pd.DataFrame(dict(Datas=dates, Volume=volume))
df = df.set_index('Datas')

# Week starting from Tuesday
week = ((df.index - pd.DateOffset(days=1)).isocalendar().week).values

def add_weekend_to_monday(week): 
    monday = week.index.weekday == 0
    weekend = week.index.weekday >= 5
    week[monday] += week[weekend].sum() 
    return week

df['Volume'] = df.groupby(week)['Volume'].apply(add_weekend_to_monday)

If you consider that weeks start from, for example, Tuesday, the problem becomes simpler. You just need to get the values of the weekend and sum it to the Monday of that week (which will be the Monday after the weekend). This will automatically handle cases in which you data might start/end on a weekend or not.

import numpy as np
import pandas as pd
np.random.seed(1)

# Sample data
dates = pd.date_range('2018-02-05', '2018-07-22', freq='D')
volume = np.random.randint(1, 50, len(dates))
df = pd.DataFrame(dict(Datas=dates, Volume=volume))
df = df.set_index('Datas')

# Week starting from Tuesday
week = ((df.index - pd.DateOffset(days=1)).isocalendar().week).values

def add_weekend_to_monday(week): 
    monday = week.index.weekday == 0
    weekend = week.index.weekday >= 5
    week[monday] += week[weekend].sum() 
    return week

df['Volume'] = df.groupby(week)['Volume'].apply(add_weekend_to_monday)
你没皮卡萌 2025-01-23 01:08:41

根据您提供的代码,您已将 CSV 文件加载到 DataFrame df 中,将“数据”列转换为日期时间,按日期升序对 DataFrame 进行排序,并将“数据”列设置为索引。

您还创建了一个新列“WEEKDAY”,其中包含索引中每个日期的星期几(0-6,其中 0 是星期一,6 是星期日)。

import pandas as pd

# Load the data and convert the 'Datas' column to a datetime
df = pd.read_csv('https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv')
df['Datas'] = pd.to_datetime(df['Datas'])

# Set the 'Datas' column as the index
df.set_index('Datas', inplace=True)

# Compute the sum of weekend days (Saturday and Sunday) and assign it to the next following Monday
weekend_sum = df.loc[df.index.weekday.isin([5,6])]['Volume'].resample('W-MON').sum()
df.loc[weekend_sum.index, 'Volume'] += weekend_sum.values

说明:

  1. df.loc[df.index.weekday.isin([5,6])] 选择索引(即日期)属于星期六或星期日(工作日)的行分别为 5 或 6)。

  2. ['Volume'].resample('W-MON').sum() 计算从星期一开始且至少包含一个周末的每周“Volume”列的总和。结果是一个系列,其中索引包含每周的开始日期,值包含相应的总和。

  3. df.loc[weekend_sum.index, 'Volume'] += Weekend_sum.values 将计算出的总和分配给下一个周一。它选择与周末总和 (weekend_sum.index) 的周开始日期相对应的行,并将相应的总和 (weekend_sum.values) 添加到“Volume”列。请注意,+= 运算符用于修改原始 DataFrame df。

Based on the code you provided, you have loaded a CSV file into a DataFrame df, converted the 'Datas' column to a datetime, sorted the DataFrame by date in ascending order, and set the 'Datas' column as the index.

You have also created a new column 'WEEKDAY' that contains the day of the week (0-6, where 0 is Monday and 6 is Sunday) for each date in the index.

import pandas as pd

# Load the data and convert the 'Datas' column to a datetime
df = pd.read_csv('https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv')
df['Datas'] = pd.to_datetime(df['Datas'])

# Set the 'Datas' column as the index
df.set_index('Datas', inplace=True)

# Compute the sum of weekend days (Saturday and Sunday) and assign it to the next following Monday
weekend_sum = df.loc[df.index.weekday.isin([5,6])]['Volume'].resample('W-MON').sum()
df.loc[weekend_sum.index, 'Volume'] += weekend_sum.values

Explanation:

  1. df.loc[df.index.weekday.isin([5,6])] selects rows where the index (i.e., the dates) falls on a Saturday or Sunday (weekday 5 or 6, respectively).

  2. ['Volume'].resample('W-MON').sum() computes the sum of the 'Volume' column for each week starting on Monday that contains at least one weekend day. The result is a Series where the index contains the start date of each week and the values contain the corresponding sums.

  3. df.loc[weekend_sum.index, 'Volume'] += weekend_sum.values assigns the computed sums to the next following Monday. It selects the rows corresponding to the start dates of the weeks with weekend sums (weekend_sum.index) and adds the corresponding sums (weekend_sum.values) to the 'Volume' column. Note that the += operator is used to modify the original DataFrame df.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文