动态加权数据框中的每个观察结果，根据其时间戳，以使最近的观察产生更大的影响

发布于 2025-02-07 09:59:07 字数 726 浏览 0 评论 0原文

我正在设计一个模型，以预测足球的成绩，并最终成为最后的联赛桌。为此，我的训练集是从以前的联赛比赛中每个团队的基本指标和高级统计数据中得出的一组调整分数。由于战术，经理，播放器等的变化，我想对最近的游戏/固定装置提出更重要的重要性。

数据帧看起来像这样：

date	Attack_score	defence_score
2022-03-18	2.3	0.4
2022-03-24	1.6 1.6 1.6	1.2
20222-2022-04-- 06	1.9	0.7

然后我计算得分的平均值。到目前为止，我引入时间因素的粗略方式是手动分配任意权重以分离这样的观察范围：

df['attack_score'].iloc[:-20].mean()*0.4 + df['attack_score'].iloc[-20:].mean()*0.6

但是，这是一种相当僵化的方法，并构成了我的模型的准确性。理想情况下，我想拥有一个函数，该函数可以在计算平均得分之前动态和逐步更新每个观察值的权重。

原文

I'm designing a model to forecast football results and ultimately the final league table. For this purpose, my training set is a set of adjusted scores derived from the underlying metrics and advanced stats of each team in their previous league games. I want to assign more importance to recent games/fixtures because of changes in tactics, managers, players, etc.

The dataframe looks like this:

Date	attack_score	defence_score
2022-03-18	2.3	0.4
2022-03-24	1.6	1.2
2022-04-06	1.9	0.7

Then I calculate the mean of the scores. So far, my crude way of introducing a time-factor has been to manually assign arbitrary weights to separate ranges of observations like this:

df['attack_score'].iloc[:-20].mean()*0.4 + df['attack_score'].iloc[-20:].mean()*0.6

However, this is a rather inflexible approach and puts a firm ceiling on how accurate my model can be. Ideally, I'd like to have a function that dynamically and incrementally updates the weights of each observation before the calculation of the mean scores.

分享到QQ

分享到微博