Python - 将周末值推至周一
我有一个数据框(称为 df),如下所示:
我正在尝试获取所有周末的“音量”值(列“WEEKDAY”=5(星期六)或 6(星期日))并将它们汇总到随后的星期一(WEEKDAY=0)。
我尝试了一些方法,但没有真正起作用,以最后三行为例:
我是什么期望是这样的:
要重现该问题:
!wget https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv
df = pd.read_csv('volume_por_dia.csv').sort_values('Datas',ascending=True)
df['Datas'] = pd.to_datetime(df['Datas'])
df = df_volume_noticias.set_index('Datas')
df['WEEKDAY'] = df.index.dayofweek
df
I have a dataframe (called df) that looks like this:
I'm trying to take all weekend 'Volume' values (the ones where column 'WEEKDAY'=5 (saturday) or 6(sunday)) and sum them to the subsequent monday(WEEKDAY=0).
I tried a few things but nothing really worked, taking an example from the last three rows:
What I'm expecting is this:
To reproduce the problem:
!wget https://raw.githubusercontent.com/brunodifranco/TCC/main/volume_por_dia.csv
df = pd.read_csv('volume_por_dia.csv').sort_values('Datas',ascending=True)
df['Datas'] = pd.to_datetime(df['Datas'])
df = df_volume_noticias.set_index('Datas')
df['WEEKDAY'] = df.index.dayofweek
df
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
在这里添加 2 个解决方案:
使用
pd.shift
(Lukas Hestermeyer 之前指出;我添加了一个简化版本)使用滚动窗口(这实际上是一行)
两种解决方案都假设:
日期
按升序排序(如果没有,则应先排序后再继续)第 1 部分 |数据准备:
第 2 部分 |解决方案:
解决方案 1 [pd.shift]:
解决方案 2 [滚动窗口]:
第 3 部分 |删除周末记录:
Adding 2 solutions here:
Using
pd.shift
(pointed out earlier by Lukas Hestermeyer; I've added a simplified version)Using rolling window (this is literally a one-liner)
Both solutions assume that;
Dates
are sorted in ascending order (if not, sorting should be done before proceeding)Part 1 | Data Prep:
Part 2 | Solutions:
Solution 1 [pd.shift] :
Solution 2 [rolling window] :
Part 3 | Removing weekend records :
这可以使用 pd.shift 解决您的问题。
产生:
This solves your question using pd.shift.
which yields:
我使用 .groupby 来解决问题。
I used .groupby to solve the problem.
您可以简单地循环行并从周五开始累积交易量,并更新周日交易量中的值。然后,删除周五和周六的行。
You can simply loop over the rows and keep accumulating volumes from Friday, and update the value in the volume of Sunday. Then, just drop the rows of Friday and Saturday.
输入:
我假设索引日期已排序,
Datas
索引是唯一的并且没有丢失的日期。相反,我无法做出一些假设:由于这些原因,在计算周末交易量之前,我首先提取第一个周六和上周一的日期:
现在我可以提取周末交易量,确保我始终拥有周六和周日的一对,并且对于这些一对中的每一个,下一个周一存在于数据框中:
现在,由于我有几个周六周日的交易量,我可以通过以下方式计算总和:
最后将周末交易量添加到起始交易量中:
我将附加最后 25 行df 下面:
Input:
I assume that the index dates are sorted, the
Datas
index is unique and that there are no missing days. Some assumptions I can't make are instead:For these reasons, before computing weekend volumes, I first extract the dates of the first Saturday and last Monday:
Now I can extract weekend volumes being sure that I always have the saturday-sunday couple and that for each of these couple, a following monday exists in the dataframe:
Now, since I have couples of saturday-sunday volumes, I can compute the sums in the following way:
Finally add the weekend-volumes to the starting volumes:
I am attaching the last 25 lines of the df below:
例如,如果考虑周从星期二开始,问题就会变得更简单。您只需获取周末的值并将其与该周的星期一相加(这将是周末后的星期一)。这将自动处理数据可能在周末或不在周末开始/结束的情况。
If you consider that weeks start from, for example, Tuesday, the problem becomes simpler. You just need to get the values of the weekend and sum it to the Monday of that week (which will be the Monday after the weekend). This will automatically handle cases in which you data might start/end on a weekend or not.
根据您提供的代码,您已将 CSV 文件加载到 DataFrame df 中,将“数据”列转换为日期时间,按日期升序对 DataFrame 进行排序,并将“数据”列设置为索引。
您还创建了一个新列“WEEKDAY”,其中包含索引中每个日期的星期几(0-6,其中 0 是星期一,6 是星期日)。
说明:
df.loc[df.index.weekday.isin([5,6])]
选择索引(即日期)属于星期六或星期日(工作日)的行分别为 5 或 6)。['Volume'].resample('W-MON').sum()
计算从星期一开始且至少包含一个周末的每周“Volume”列的总和。结果是一个系列,其中索引包含每周的开始日期,值包含相应的总和。df.loc[weekend_sum.index, 'Volume'] += Weekend_sum.values
将计算出的总和分配给下一个周一。它选择与周末总和(weekend_sum.index)
的周开始日期相对应的行,并将相应的总和(weekend_sum.values)
添加到“Volume”列。请注意,+=
运算符用于修改原始 DataFrame df。Based on the code you provided, you have loaded a CSV file into a DataFrame df, converted the 'Datas' column to a datetime, sorted the DataFrame by date in ascending order, and set the 'Datas' column as the index.
You have also created a new column 'WEEKDAY' that contains the day of the week (0-6, where 0 is Monday and 6 is Sunday) for each date in the index.
Explanation:
df.loc[df.index.weekday.isin([5,6])]
selects rows where the index (i.e., the dates) falls on a Saturday or Sunday (weekday 5 or 6, respectively).['Volume'].resample('W-MON').sum()
computes the sum of the 'Volume' column for each week starting on Monday that contains at least one weekend day. The result is a Series where the index contains the start date of each week and the values contain the corresponding sums.df.loc[weekend_sum.index, 'Volume'] += weekend_sum.values
assigns the computed sums to the next following Monday. It selects the rows corresponding to the start dates of the weeks with weekend sums(weekend_sum.index)
and adds the corresponding sums(weekend_sum.values)
to the 'Volume' column. Note that the+=
operator is used to modify the original DataFrame df.