计算数据框列中的实例数，将其存储到网格中，并预测概率

发布于 2025-02-04 22:29:04 字数 197 浏览 2 评论 0原文

我正在尝试使用NFL游戏的数据框架，并预测获胜的概率，例如主队代表行，而客场团队代表列（每个编号为0-9）。我想在数据框中迭代每个游戏，并计算导致每个正方形的实例数，以使它们被“桶装”，然后对预测的分数进行相同的操作，以便您可以获得所有100个正方形的概率。

我如何从每行分数中提取“一个”数字，并且每个正方形进行计数？

任何洞察力和指导都将不胜感激！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜唯美灬不弃 2025-02-11 22:29:04

花了我一段时间来弄清楚你的意思。所以我明白了你在做方块之后。

要获取“一个”，只需将乐谱转换为弦并获得最后的Characheter即可。 IE '35'[ - 1]将产生'5'，然后只需要将其转换为int即可。然后，只需要做每个组合的.value_counts（）即可。要获得百分之一，请除以玩多少场比赛。

由于您没有提供任何数据，因此我必须拥有自己的：

import pandas as pd
import re


df_list = []
for week in range(1, 19):
    print(week)
    url = f'https://www.espn.com/nfl/schedule/_/week/{week}/year/2021'
    df_list += pd.read_html(url)

df = pd.concat(df_list, axis=0)
df[['Away','Home']] = df['result'].str.split(',', expand=True)
df = df[['Away','Home']]
df = df.dropna()


def get_digits(row):
        score = int(re.search('\d+', row)[0][-1])
        return score
    
for col in ['Away','Home']:
    df[col] = df[col].apply(lambda x: get_digits(x))
    
away_counts = df['Away'].value_counts()
home_counts = df['Home'].value_counts() 

data = {}
for col in range(0,10):
    data[col] = []
    try:
        away_count = away_counts[col]
    except:
        away_count = 0
    for row in range(0,10):
        try:
            home_count = home_counts[row]
        except:
            home_count = 0
        
        data[col].append(round((away_count + home_count)/(len(df)*2),3))


results = pd.DataFrame(data)

输出：

print(results.to_string())
       0      1      2      3      4      5      6      7      8      9
0  0.160  0.162  0.110  0.149  0.160  0.129  0.140  0.171  0.136  0.121
1  0.116  0.118  0.066  0.105  0.116  0.085  0.096  0.127  0.092  0.077
2  0.094  0.096  0.044  0.083  0.094  0.062  0.074  0.105  0.070  0.055
3  0.132  0.134  0.083  0.121  0.132  0.101  0.112  0.143  0.108  0.094
4  0.121  0.123  0.072  0.110  0.121  0.090  0.101  0.132  0.097  0.083
5  0.085  0.086  0.035  0.074  0.085  0.053  0.064  0.096  0.061  0.046
6  0.118  0.119  0.068  0.107  0.118  0.086  0.097  0.129  0.094  0.079
7  0.142  0.143  0.092  0.131  0.142  0.110  0.121  0.153  0.118  0.103
8  0.086  0.088  0.037  0.075  0.086  0.055  0.066  0.097  0.062  0.048
9  0.108  0.110  0.059  0.097  0.108  0.077  0.088  0.119  0.085  0.070

Took me a while to figure out what you meant. So I see what you are after doing the squares.

To get the 'ones', simply convert the score to a string and get the last characheter. Ie '35'[-1] will yield '5', then just need to convert that to an int. Then just need to do a .value_counts() of each combination. To get a percent, divide by how many games are played.

Since you didn't provide any data, I had to get my own:

import pandas as pd
import re


df_list = []
for week in range(1, 19):
    print(week)
    url = f'https://www.espn.com/nfl/schedule/_/week/{week}/year/2021'
    df_list += pd.read_html(url)

df = pd.concat(df_list, axis=0)
df[['Away','Home']] = df['result'].str.split(',', expand=True)
df = df[['Away','Home']]
df = df.dropna()


def get_digits(row):
        score = int(re.search('\d+', row)[0][-1])
        return score
    
for col in ['Away','Home']:
    df[col] = df[col].apply(lambda x: get_digits(x))
    
away_counts = df['Away'].value_counts()
home_counts = df['Home'].value_counts() 

data = {}
for col in range(0,10):
    data[col] = []
    try:
        away_count = away_counts[col]
    except:
        away_count = 0
    for row in range(0,10):
        try:
            home_count = home_counts[row]
        except:
            home_count = 0
        
        data[col].append(round((away_count + home_count)/(len(df)*2),3))


results = pd.DataFrame(data)

Output:

print(results.to_string())
       0      1      2      3      4      5      6      7      8      9
0  0.160  0.162  0.110  0.149  0.160  0.129  0.140  0.171  0.136  0.121
1  0.116  0.118  0.066  0.105  0.116  0.085  0.096  0.127  0.092  0.077
2  0.094  0.096  0.044  0.083  0.094  0.062  0.074  0.105  0.070  0.055
3  0.132  0.134  0.083  0.121  0.132  0.101  0.112  0.143  0.108  0.094
4  0.121  0.123  0.072  0.110  0.121  0.090  0.101  0.132  0.097  0.083
5  0.085  0.086  0.035  0.074  0.085  0.053  0.064  0.096  0.061  0.046
6  0.118  0.119  0.068  0.107  0.118  0.086  0.097  0.129  0.094  0.079
7  0.142  0.143  0.092  0.131  0.142  0.110  0.121  0.153  0.118  0.103
8  0.086  0.088  0.037  0.075  0.086  0.055  0.066  0.097  0.062  0.048
9  0.108  0.110  0.059  0.097  0.108  0.077  0.088  0.119  0.085  0.070