动态加权数据框中的每个观察结果,根据其时间戳,以使最近的观察产生更大的影响
我正在设计一个模型,以预测足球的成绩,并最终成为最后的联赛桌。为此,我的训练集是从以前的联赛比赛中每个团队的基本指标和高级统计数据中得出的一组调整分数。由于战术,经理,播放器等的变化,我想对最近的游戏/固定装置提出更重要的重要性。
数据帧看起来像这样:
date | Attack_score | defence_score |
---|---|---|
2022-03-18 | 2.3 | 0.4 |
2022-03-24 | 1.6 1.6 1.6 | 1.2 |
20222-2022-04-- 06 | 1.9 | 0.7 |
然后我计算得分的平均值。到目前为止,我引入时间因素的粗略方式是手动分配任意权重以分离这样的观察范围:
df['attack_score'].iloc[:-20].mean()*0.4 + df['attack_score'].iloc[-20:].mean()*0.6
但是,这是一种相当僵化的方法,并构成了我的模型的准确性。理想情况下,我想拥有一个函数,该函数可以在计算平均得分之前动态和逐步更新每个观察值的权重。
I'm designing a model to forecast football results and ultimately the final league table. For this purpose, my training set is a set of adjusted scores derived from the underlying metrics and advanced stats of each team in their previous league games. I want to assign more importance to recent games/fixtures because of changes in tactics, managers, players, etc.
The dataframe looks like this:
Date | attack_score | defence_score |
---|---|---|
2022-03-18 | 2.3 | 0.4 |
2022-03-24 | 1.6 | 1.2 |
2022-04-06 | 1.9 | 0.7 |
Then I calculate the mean of the scores. So far, my crude way of introducing a time-factor has been to manually assign arbitrary weights to separate ranges of observations like this:
df['attack_score'].iloc[:-20].mean()*0.4 + df['attack_score'].iloc[-20:].mean()*0.6
However, this is a rather inflexible approach and puts a firm ceiling on how accurate my model can be. Ideally, I'd like to have a function that dynamically and incrementally updates the weights of each observation before the calculation of the mean scores.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论