计算数据框列中的实例数,将其存储到网格中,并预测概率

发布于 2025-02-04 22:29:04 字数 197 浏览 2 评论 0原文

我正在尝试使用NFL游戏的数据框架,并预测获胜的概率,例如主队代表行,而客场团队代表列(每个编号为0-9)。我想在数据框中迭代每个游戏,并计算导致每个正方形的实例数,以使它们被“桶装”,然后对预测的分数进行相同的操作,以便您可以获得所有100个正方形的概率。

我如何从每行分数中提取“一个”数字,并且每个正方形进行计数?

任何洞察力和指导都将不胜感激!

I am trying to take a dataframe of NFL games, and predict the probability of winning where for example the home team represents the rows and the away team represents the columns (each numbered 0-9). I want to iterate through each game in the dataframe and count the number of instances that result in each square so that they are 'bucketized', then do the same thing for the predicted scores so that you can get the probability for all 100 squares.

How would I extract the 'ones' digit from scores in each row and do this counting per square?

Any insight and guidance is greatly appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜唯美灬不弃 2025-02-11 22:29:04

花了我一段时间来弄清楚你的意思。所以我明白了你在做方块之后。

要获取“一个”,只需将乐谱转换为弦并获得最后的Characheter即可。 IE '35'[ - 1]将产生'5',然后只需要将其转换为int即可。然后,只需要做每个组合的.value_counts()即可。要获得百分之一,请除以玩多少场比赛。

由于您没有提供任何数据,因此我必须拥有自己的:

import pandas as pd
import re


df_list = []
for week in range(1, 19):
    print(week)
    url = f'https://www.espn.com/nfl/schedule/_/week/{week}/year/2021'
    df_list += pd.read_html(url)

df = pd.concat(df_list, axis=0)
df[['Away','Home']] = df['result'].str.split(',', expand=True)
df = df[['Away','Home']]
df = df.dropna()


def get_digits(row):
        score = int(re.search('\d+', row)[0][-1])
        return score
    
for col in ['Away','Home']:
    df[col] = df[col].apply(lambda x: get_digits(x))
    
away_counts = df['Away'].value_counts()
home_counts = df['Home'].value_counts() 

data = {}
for col in range(0,10):
    data[col] = []
    try:
        away_count = away_counts[col]
    except:
        away_count = 0
    for row in range(0,10):
        try:
            home_count = home_counts[row]
        except:
            home_count = 0
        
        data[col].append(round((away_count + home_count)/(len(df)*2),3))


results = pd.DataFrame(data)

输出:

print(results.to_string())
       0      1      2      3      4      5      6      7      8      9
0  0.160  0.162  0.110  0.149  0.160  0.129  0.140  0.171  0.136  0.121
1  0.116  0.118  0.066  0.105  0.116  0.085  0.096  0.127  0.092  0.077
2  0.094  0.096  0.044  0.083  0.094  0.062  0.074  0.105  0.070  0.055
3  0.132  0.134  0.083  0.121  0.132  0.101  0.112  0.143  0.108  0.094
4  0.121  0.123  0.072  0.110  0.121  0.090  0.101  0.132  0.097  0.083
5  0.085  0.086  0.035  0.074  0.085  0.053  0.064  0.096  0.061  0.046
6  0.118  0.119  0.068  0.107  0.118  0.086  0.097  0.129  0.094  0.079
7  0.142  0.143  0.092  0.131  0.142  0.110  0.121  0.153  0.118  0.103
8  0.086  0.088  0.037  0.075  0.086  0.055  0.066  0.097  0.062  0.048
9  0.108  0.110  0.059  0.097  0.108  0.077  0.088  0.119  0.085  0.070

“在此处输入图像说明”

Took me a while to figure out what you meant. So I see what you are after doing the squares.

To get the 'ones', simply convert the score to a string and get the last characheter. Ie '35'[-1] will yield '5', then just need to convert that to an int. Then just need to do a .value_counts() of each combination. To get a percent, divide by how many games are played.

Since you didn't provide any data, I had to get my own:

import pandas as pd
import re


df_list = []
for week in range(1, 19):
    print(week)
    url = f'https://www.espn.com/nfl/schedule/_/week/{week}/year/2021'
    df_list += pd.read_html(url)

df = pd.concat(df_list, axis=0)
df[['Away','Home']] = df['result'].str.split(',', expand=True)
df = df[['Away','Home']]
df = df.dropna()


def get_digits(row):
        score = int(re.search('\d+', row)[0][-1])
        return score
    
for col in ['Away','Home']:
    df[col] = df[col].apply(lambda x: get_digits(x))
    
away_counts = df['Away'].value_counts()
home_counts = df['Home'].value_counts() 

data = {}
for col in range(0,10):
    data[col] = []
    try:
        away_count = away_counts[col]
    except:
        away_count = 0
    for row in range(0,10):
        try:
            home_count = home_counts[row]
        except:
            home_count = 0
        
        data[col].append(round((away_count + home_count)/(len(df)*2),3))


results = pd.DataFrame(data)

Output:

print(results.to_string())
       0      1      2      3      4      5      6      7      8      9
0  0.160  0.162  0.110  0.149  0.160  0.129  0.140  0.171  0.136  0.121
1  0.116  0.118  0.066  0.105  0.116  0.085  0.096  0.127  0.092  0.077
2  0.094  0.096  0.044  0.083  0.094  0.062  0.074  0.105  0.070  0.055
3  0.132  0.134  0.083  0.121  0.132  0.101  0.112  0.143  0.108  0.094
4  0.121  0.123  0.072  0.110  0.121  0.090  0.101  0.132  0.097  0.083
5  0.085  0.086  0.035  0.074  0.085  0.053  0.064  0.096  0.061  0.046
6  0.118  0.119  0.068  0.107  0.118  0.086  0.097  0.129  0.094  0.079
7  0.142  0.143  0.092  0.131  0.142  0.110  0.121  0.153  0.118  0.103
8  0.086  0.088  0.037  0.075  0.086  0.055  0.066  0.097  0.062  0.048
9  0.108  0.110  0.059  0.097  0.108  0.077  0.088  0.119  0.085  0.070

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文