预测某个时间点的比赛结果
任意时间点足球比赛的预测结果(主队获胜、平局或失利)。
您好,我正在尝试根据 MSSQL 数据库中的过去信息,根据比赛时间和比分来衡量 输出就像棒球的扇形图一样 http://www.fangraphs.com/scoreboard.aspx?date=2010 -11-01
虽然有两行,因为有三种而不是两种可能的结果
根据数据和现有表格,我可以创建这样的比赛记录
Time TeamID Venue MatchID Result
6 TOT H 5 W
27 ASV A 5 W
58 ASV A 5 W
66 TOT H 5 W
77 TOT H 5 W
所以对于这场比赛的图表,主队 TOT 将从胜率在 45% 左右(基于主场获胜的历史概率),当他们进球时,胜率会飙升,在 ASV 进球两次后显着下降,但当他们得分以 3-2 领先然后上升时,胜率可能高于 90%在最后 90 分钟标记时逐渐达到 100%
所以我想浏览一下我有数据的 7500 场比赛,并根据这些数据确定 90 分钟比赛中的每一分钟主场获胜、平局或失败的机会是多少 球队
例如,在最简单的情况下,比赛一分钟后,实际上有 44 支主队进球,其中 33 支球队获胜,6 支平局,5 支 输球。对应的情况下,客队取得的成绩是主队9胜8平23负。然而,我很难弄清楚如何获得所有 90 分钟的得分线并将其与最终结果进行比较(在任何特定分钟内只能进一个球)
TIA 寻求任何帮助
Hi I am trying to gauge from past information I have on an MSSQL database the predicted outcomes of soccer games (win, tie or loss for the home team) at any point in time based on the minutes played and the scoreline
What I had envisaged as output was something like fangraphs does for baseball
http://www.fangraphs.com/scoreboard.aspx?date=2010-11-01
although with two lines as there are three rather than two possible outcomes
From the data and the existing tables I can create game records like this
Time TeamID Venue MatchID Result
6 TOT H 5 W
27 ASV A 5 W
58 ASV A 5 W
66 TOT H 5 W
77 TOT H 5 W
So for the graph for this game the home team TOT would start with the win line at around 45% (based on the historical probability of a home win) it would spike when they score their goal, dip significantly after ASV score twice but be probably above 90% when they score to go 3-2 up and then rise gently to 100% at the cloing 90 minute mark
So I want to go through the 7500 games I have data on and based on them establish for every minute of a 90 minute game what are the chances of a win, tie or loss for the home team based on the these results
For instance, in the simplest situation after 1 minute of play in actuality 44 of the home teams scored, 33 of them went on to win, 6 tied and 5 lost. The corresponding case where the away team scored has been 9 wins, 8 ties 23 losses for the home team. However, I am having trouble getting my head around how to get all 90 minutes scorelines and compare them with the final result (Only one goal can be scored in any specific minute)
TIA for any help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
总会有一些东西可以添加到模型中,但我要做的第一件事是,对于每场比赛,提取每分钟的得分,并假设获胜的概率不取决于进球的时间,但取决于现在的分数。
所以现在每场比赛你会有 90 个数据点。
我接下来要做的就是,对于每分钟片段,针对每种得分配置,将所有比赛的胜利、失败和平局的数量相加。
因此,该表中的每个条目可能对应于这样的内容:
您可能想尝试使用分数差异而不是实际分数值。
无论哪种方式,一旦您以这种方式格式化数据,就很容易获得合理的解决方案。
如果比赛正在进行,时间为
第 77 分钟
,比分为{home:5,away:2}
,则 (MLE) 估计获胜率为 90%,10 % 平局,0% 失败(根据上面的示例表条目)。所以你已经看到了包含“拉普拉斯平滑”将如何帮助:将+1添加到每个赢/输/平局计数器的最终值。这样,如果您从未在这种情况下见过损失,您就不会说这是不可能,不可能是一个非常强烈的词(查找beta 或 drichlet背景分布)。
这种方法的明显问题是,如果你之前从未见过特定的分数组合,它会预测 (33%,33%,33%),这在某些情况下显然是错误的。
最简单的解决办法是强制执行“领先 6 分至少与领先 5 分一样好”之类的规则。这很丑陋,但这是一个开始。
为了避免这种特殊情况的逻辑,您可以尝试使用蒙特卡罗近似对该方法进行平均。
这些方法中最简单的是:根据我的所有数据,我预计每支球队在每分钟比赛中进球的机会约为十分之一 ->从当前点开始模拟游戏 10000 次,计算赢/输/平局的次数,就完成了。
如果这太随机或处理器强度太大,请切换到马尔可夫链。
There will always be things you can add to the model, but the first thing I would do is, for each game, pull out the score at each minute, and assume that the probability of winning doesn't depend on when the goal was scored, but depends on what the score is now.
So now you would have 90 data points per game.
The next thing I would do is, for each minute slice, add up the number of wins, losses, and draws over all games, for each configuration of scores.
So each entry in that table might correspond to something like this:
you might want to try using the difference in score instead of the actual score values..
Either way, Once you have the data formatted that way, getting a reasonable solution is easy.
If a game is on, and it's
minute 77
, and the score is{home:5, away:2}
, the (MLE) estimate is 90% wins, 10% draw, 0% looses (according to the example table entry above).So you see already how it will help to include "laplacian smoothing": adding +1 to the final values of each of those win/lose/draw counters. This way if you've never seen a loss in this exact situation you don't say it's impossible, impossible is a very strong word (look for beta or drichlet distributions for background).
The obvious problem with this approach is that if you've never seen a particular score combination before it will predict (33%,33%,33%), which is obviously wrong in some cases.
The simplest fix would be to enforce a rule like "leading by 6 points is at least as good as leading by 5 points". It's ugly, but it's a start.
To avoid that sort of special-case logic you could try averaging that approach with a monte-carlo approximation.
The simplest of those approaches is to say : over all my data I expect about a 1 in 30 chance of a goal by each team in each minute of play -> simulate the game 10000 times form the current point, count the number of wins/losses/draws and you're done.
If that's too random, or processor intense, switch to Markov Chains.