用于检测历史体育比赛数据趋势的编程技术
摘要
我一直在使用 excel 关注 Betfair 赔率,看看是否有机会更好地预测未来的比赛结果。下面的背景更详细地介绍了我迄今为止的进展。
我现在想更进一步,看看我可能实现的数据挖掘/模式匹配/算法技术。我在使用动态模型(扩展)和在 Excel 中使用 Solver 进行优化方面有一些经验,但除了术语本身之外,我不熟悉数据挖掘
是否有可行的数据挖掘编程技术可供我部署在 VBA 中进行此分析?
(我意识到这个问题可能被某些人视为边缘问题,但我认为 Stack Overflow
比说 更适合这个问题>数学
- 我热衷于了解我可以在 VBA 中应用的潜在编程选项/算法)
我强烈倾向于使用 VBA \ VBscript 来查看此内容,因为这是我的编码背景,但如果其他选项明显更好,我愿意接受其他选项。
背景
我已将过去几年澳式橄榄球的数据提取到 Excel 中。这些数据给了我:
- 逐季度的结果
(例如,WWWL 表示 1 队在输掉比赛前的前三节领先,DLLL 表示 1 队和 2 队在第一节结束时打平,然后 2 队在剩余比赛中领先)。< /em> - 相同的信息被重新分组为半对半的结果
- 主队和客队(球队 1 是主场,球队 2 是客场)
- 比赛日 体育场
- 一年中的月份
然后我将其与其他数据集进行匹配,例如
- 按周联赛阶梯(完全的)
- 体育场是露天的还是封闭的(已完成)
- 赛前博彩公司赔率(待办事项)
- 露天体育场的天气条件发生了什么(待办事项)
然后用 PivotTables
(可能是 PowerPivot)进行骰子和拼接)询问这些数据以寻找博弈机会,例如:
- 某些球队是否比其他球队更频繁地从头到尾领先(WWWW),并计算“四分之四”获胜的赔率(WWWW) 支付的费用远高于此 可能性将表明普通获胜(因此奠定普通胜利,返回WWWW)
- 寻找主场和客场表现的显着差异(即是否有主场知识或党派偏见)主场球迷的支持导致更多 3/4 倍的比分逆转)
- 比较露天体育场与封闭式屋顶体育场的结果(消除天气影响)
- 一周的长途旅行是否会影响下周的结果
- 某些球队是否会产生更多的特定得分模式经常与标准联赛结果相比,
- 排名较低的球队更有可能在整场比赛中领先,而不是落后击败排名较高的球队
Summary
I have been looking at historical Australian Rules outcomes using excel with an eye on Betfair odds to see if is an opportunity to better predict future match outcomes. My progress to-date is covered in more detail under Background below.
I’d now like to go a step further and look at data mining / pattern matching / algorithim techniques that I could possibly implement. I have had some experience with using dynamic models (Extend) and using Solver in Excel for optimisation but I am unfamiliar with data mining other than the term itself
Are there viable data mining programming techniques available to me to deploy for this analysis in VBA?
(I realise this question may be seen as borderline by some but I think Stack Overflow
is better suited to this question than say Math
- I am keen to understand potential programming options/algorithims that I can apply in VBA)
My strong preference is to look at this with VBA \ VBscript as this is my coding background but I am open to other options if they are significantly better.
Background
I have extracted the data for Australian Rules football over the last few years into Excel. This data gives me:
- Quarter by Quarter results
(for example WWWL means team 1 leads for the first three quartes before losing the game, DLLL means teams 1 and 2 were level at the end of the first quarter, then team 2 lead for the remainder of the match). - The same info is regrouped into Half by Half results
- Home and Away teams (team 1 is home, team 2 away)
- Match Day Stadium
- Month of the year
Which I then match up to other data sets such as
- League ladder by week (completed)
- Whether the stadium is open air or closed (completed)
- Bookmakers odds pre game (to do)
- What happened with the weather conditions for open air stadiums (to do)
And then dice and splice with PivotTables
(perhaps PowerPivot) to interrogate this data to look for gaming opportunities, for example:
- Do certain teams team to lead from start to finish (WWWW) more often than others, and do the odds for a “Four Quarter” win (WWWW) pay disproportionately more than this
likelihood would indicate for a vanilla win (so Lay the vanilla win, Back the WWWW) - Looking for marked differentials in Home and Away performances (i.e. does home ground knowledge, or partisan home crowd support lead to more reversals of ¾ times scores)
- Comparing results of open-air versus closed roof stadiums (removing weather impact)
- Does a long distance travel one week impact the next week(s) outcome
- Do certain teams produce certain scoring patterns more often than standard league results
- Is a lower ranked team more likely to lead throughout an entire match than come from behind to beat a higher ranked team
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
查看快速矿工,它有许多内置工具来探索您的数据。假设您有一定的能力使用计算机工具,还可以查看 Weka,它是一种机器学习工具。如果您对数据进行注释,则可以根据数据训练算法,看看哪个算法在预测获胜者方面最准确。
例如
A 队与 B 队比赛,您基本上必须在 csv 文件中表示比赛流程,任何其他统计数据也在同一行上,然后在最后一个选项卡上说明哪支球队获胜。你说哪支球队获胜的部分是用来训练的。
Check out rapid miner, it has a number of built in tools to explore your data. Assuming you are somewhat competent getting around computer tools, also check out Weka which is a machine learning tool. If you annotate your data, you can train algorithms on the data and see which is the most accurate at predicting the winner.
For example
Team A plays Team B, you'd basically have to represent the flow of the game in a csv file, any additional stats as well on the same line, then at the very last tab say which team won. The part where you say which team won is whats used to train on.
这篇文章有点晚了,但只是想在这个过程本身上加上我的 2 美分。
根据我在此类市场(包括 Betfair)上运行预测算法的(相当广泛的)工作,我的结论是,预测事件发生(回)或不发生(外)的概率是没有用的。问题在于,即使您能够以高概率准确地清楚地识别趋势,但由于市场赔率,您仍然无法从预测中获利。赔率实际上使概率回到了 50/50。
事实上,市场赔率本身会告诉您事件发生的隐含概率。
隐含概率%= (1/(赔率 - 1 ))
例如:如果回抽是 4.1;它实际上意味着 32.25% 的平局机会。
为了有效地从 Betfair 中获利,需要查看市场赔率的差异,而不是查看事件发生的概率。示例:由于赔率的快速市场变动,可能(暂时)存在跨市场赔率倾向于提供>100%回报的情况。这种情况不会在重大比赛中发生。主要是在小批量的比赛中。
This post is a bit late, but just want to add my 2 cents on the process itself.
From my (rather extensive) work on running prediction algorithm on markets like these (Betfair included), my conclusion is that predicting the probability of an event happening (back) or not-happening (lay) is of no use. The issue is that even if you can clearly identify a trend with high probability of accuracy, you will still not be able to profit from the prediction because of the market-odds. The odds effectively tilt the probability back to 50/50.
In fact, the market odds itself will tell you the Implied Probability of an event happening.
Implied Probability%= (1/(odds - 1 ))
for example: if the Back-Draw is 4.1; it effectively means a 32.25% chance of a Draw.
To effectively profit from Betfair, one need to look at the discrepancy of the market-odds instead of looking at the probability of an event happening. Example: due to the rapid market movement of odds, there may (briefly) exists scenarios where cross-market odds are tilted to provide >100% return. This won't happen on major matches. Mostly on lower-volume matches.