我如何连接滚动回归结果| Python

发布于 2025-02-10 02:04:24 字数 4513 浏览 1 评论 0原文

我很难创建数据框来存储我的回归结果。对于每个股票，它以其尊敬的窗口计算系数（beta）及其标准误差。

我遇到的新问题是，行正在重复自己以计算每个列的每个值，从而导致NAN值。我该如何纠正con座？

这是

def rolling_regression_stats():
    tickers = df[['FDX', 'BRK', 'MSFT', 'NVDA', 'INTC', 'AMD', 'JPM', 'T', 'AAPL', 'AMZN', 'GS']]
    rolling_window = df
    iterable = zip(range(1110), range(52,1162))

    total_df = pd.DataFrame()
    for y, x in iterable:
        for t in tickers:
            model = smf.ols(f'{t} ~ SP50', data= rolling_window.iloc[y:x]).fit()
            beta_coef = model.params['SP50']
            std_error = model.bse['SP50']

            window_range = (f'{y}-{x}')
            
            results = pd.DataFrame({
                "Window":window_range,
                f"{t} Beta":beta_coef,
                f"{t}Beta STD": std_error,
                
            },index=[0])
            total_df = pd.concat([total_df,results], axis=0)
    print(total_df)    
rolling_regression_stats()

我尝试创建

这是当前输出。但是看来计算会跳过每列的行，从而导致NAN值。

          Window  FDX Beta  FDXBeta STD  BRK Beta  BRKBeta STD  MSFT Beta  MSFTBeta STD  NVDA Beta  ...    T Beta  TBeta STD   AAPL Beta  AAPLBeta STD  AMZN Beta  AMZNBeta STD   GS Beta  GSBeta STD
0           0-52 -0.288299     0.346499       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
1           0-52       NaN          NaN -0.396694     0.366258        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
2           0-52       NaN          NaN       NaN          NaN   1.214212      0.527404        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
3           0-52       NaN          NaN       NaN          NaN        NaN           NaN   7.324437  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
4           0-52       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
...          ...       ...          ...       ...          ...        ...           ...        ...  ...       ...        ...         ...           ...        ...           ...       ...         ...
12205  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
12206  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ... -0.294726   0.043549         NaN           NaN        NaN           NaN       NaN         NaN
12207  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN  108.959035     12.653105        NaN           NaN       NaN         NaN
12208  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN   5.257065      2.785473       NaN         NaN
12209  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN  1.325418    0.244893

这是修复它的代码！

def rolling_regression_stats():
    tickers = df[['FDX', 'BRK', 'MSFT', 'NVDA', 'INTC', 'AMD', 'JPM', 'T', 'AAPL', 'AMZN', 'GS']]
    rolling_window = df
    iterable = zip(range(1110), range(52,1162))

    total_df = pd.DataFrame()
    for y, x in iterable:
        yx_df = pd.DataFrame({'window': [f'{y}-{x}']})

        for t in tickers:
            model = smf.ols(f'{t} ~ SP50', data= rolling_window.iloc[y:x]).fit()
            beta_coef = model.params['SP50']
            std_error = model.bse['SP50']

            # window_range = (f'{y}-{x}')
            
            res = pd.DataFrame({f'{t} Beta': beta_coef, f'{t} STDERR': std_error},index=[0])
           
            yx_df = pd.concat([yx_df, res], axis=1)

        total_df = pd.concat([total_df, yx_df], axis=0, ignore_index=True)
    print(total_df)
rolling_regression_stats()

原文

I'm having trouble creating a data frame to store my regression results. For each ticker, it calculates the coefficient(Beta) and its standard error with its respected window.

The new problem that I'm having is that the rows are repeating themselves to calculate each value per column resulting in NaN values. How can I correct the concat?

Here is the code

def rolling_regression_stats():
    tickers = df[['FDX', 'BRK', 'MSFT', 'NVDA', 'INTC', 'AMD', 'JPM', 'T', 'AAPL', 'AMZN', 'GS']]
    rolling_window = df
    iterable = zip(range(1110), range(52,1162))

    total_df = pd.DataFrame()
    for y, x in iterable:
        for t in tickers:
            model = smf.ols(f'{t} ~ SP50', data= rolling_window.iloc[y:x]).fit()
            beta_coef = model.params['SP50']
            std_error = model.bse['SP50']

            window_range = (f'{y}-{x}')
            
            results = pd.DataFrame({
                "Window":window_range,
                f"{t} Beta":beta_coef,
                f"{t}Beta STD": std_error,
                
            },index=[0])
            total_df = pd.concat([total_df,results], axis=0)
    print(total_df)    
rolling_regression_stats()

Here is the example of the dataframe I'm trying to create

Here is the current output. But it seems that the calculations are skipping rows for each column resulting in Nan Values.

          Window  FDX Beta  FDXBeta STD  BRK Beta  BRKBeta STD  MSFT Beta  MSFTBeta STD  NVDA Beta  ...    T Beta  TBeta STD   AAPL Beta  AAPLBeta STD  AMZN Beta  AMZNBeta STD   GS Beta  GSBeta STD
0           0-52 -0.288299     0.346499       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
1           0-52       NaN          NaN -0.396694     0.366258        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
2           0-52       NaN          NaN       NaN          NaN   1.214212      0.527404        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
3           0-52       NaN          NaN       NaN          NaN        NaN           NaN   7.324437  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
4           0-52       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
...          ...       ...          ...       ...          ...        ...           ...        ...  ...       ...        ...         ...           ...        ...           ...       ...         ...
12205  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN       NaN         NaN
12206  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ... -0.294726   0.043549         NaN           NaN        NaN           NaN       NaN         NaN
12207  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN  108.959035     12.653105        NaN           NaN       NaN         NaN
12208  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN   5.257065      2.785473       NaN         NaN
12209  1109-1161       NaN          NaN       NaN          NaN        NaN           NaN        NaN  ...       NaN        NaN         NaN           NaN        NaN           NaN  1.325418    0.244893

Here is the code that fixed it!

def rolling_regression_stats():
    tickers = df[['FDX', 'BRK', 'MSFT', 'NVDA', 'INTC', 'AMD', 'JPM', 'T', 'AAPL', 'AMZN', 'GS']]
    rolling_window = df
    iterable = zip(range(1110), range(52,1162))

    total_df = pd.DataFrame()
    for y, x in iterable:
        yx_df = pd.DataFrame({'window': [f'{y}-{x}']})

        for t in tickers:
            model = smf.ols(f'{t} ~ SP50', data= rolling_window.iloc[y:x]).fit()
            beta_coef = model.params['SP50']
            std_error = model.bse['SP50']

            # window_range = (f'{y}-{x}')
            
            res = pd.DataFrame({f'{t} Beta': beta_coef, f'{t} STDERR': std_error},index=[0])
           
            yx_df = pd.concat([yx_df, res], axis=1)

        total_df = pd.concat([total_df, yx_df], axis=0, ignore_index=True)
    print(total_df)
rolling_regression_stats()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

黑凤梨 2025-02-17 02:04:24

这是使用上面评论中概述的逻辑的示例。您可以看到一个dataframe（yx_df）初始化了每个新的y，x值，然后将新列与yx_df = pd串联到它的不同股票值中。。。

编辑：添加窗口列yx_df dataframe的初始化。当获得新的y，x值时，该列只需要分配一次。

def rolling_reg():
    tickers = ['FDX', 'BRK']
    iterable = zip(range(5), range(50, 55))

    total_df = pd.DataFrame()

    for y, x in iterable:
        yx_df = pd.DataFrame({'window': [f'{y}-{x}']})

        for t in tickers:
            res = pd.DataFrame(
                {t: np.random.randint(0, 10), f"{t}_2": np.random.randn(1)}
            )
            yx_df = pd.concat([yx_df, res], axis = 1)

        total_df = pd.concat([total_df, yx_df], axis = 0, ignore_index=True)

    return total_df

rolling_reg()

#   window  FDX     FDX_2   BRK      BRK_2
# 0   0-50    7  0.232365     6  -1.491573
# 1   1-51    9  0.302536     1   0.871351
# 2   2-52    6  0.233803     9  -1.306058
# 3   3-53    7 -0.203941     8   0.454480
# 4   4-54    7 -0.618590     7   0.810528

Here is an example using the logic outlined in my comment above. You can see one dataframe (yx_df) is initialized for every new y, x values, then new columns are concatenated to it for different ticker values with yx_df = pd.concat([yx_df, res], axis = 1), and finally a full row is concatenated to the total_df after the loop over all tickers is done with total_df = pd.concat([total_df, yx_df], axis = 0, ignore_index=True).

Edit: Added window column to the initialization of the yx_df dataframe. That column only needs to have its value assigned once when new y, x values are obtained.

def rolling_reg():
    tickers = ['FDX', 'BRK']
    iterable = zip(range(5), range(50, 55))

    total_df = pd.DataFrame()

    for y, x in iterable:
        yx_df = pd.DataFrame({'window': [f'{y}-{x}']})

        for t in tickers:
            res = pd.DataFrame(
                {t: np.random.randint(0, 10), f"{t}_2": np.random.randn(1)}
            )
            yx_df = pd.concat([yx_df, res], axis = 1)

        total_df = pd.concat([total_df, yx_df], axis = 0, ignore_index=True)

    return total_df

rolling_reg()

#   window  FDX     FDX_2   BRK      BRK_2
# 0   0-50    7  0.232365     6  -1.491573
# 1   1-51    9  0.302536     1   0.871351
# 2   2-52    6  0.233803     9  -1.306058
# 3   3-53    7 -0.203941     8   0.454480
# 4   4-54    7 -0.618590     7   0.810528

回复收藏 0 原文

差↓一点笑了 2025-02-17 02:04:24

你能做这样的事情：

# create a storage df; store this outside of your loop
total_df = pd.DataFrame()

# create your dataframe with results for one regression in the loop
results = pd.DataFrame({
                "Window":window_range,
                f"{t} Beta":beta_coef,
                f"{t}Beta STD": std_error,                
            },index=[0])

# concat the result in the loop like so
total_df = pd.concat([total_df, results], axis=0)

Can you do something like this:

# create a storage df; store this outside of your loop
total_df = pd.DataFrame()

# create your dataframe with results for one regression in the loop
results = pd.DataFrame({
                "Window":window_range,
                f"{t} Beta":beta_coef,
                f"{t}Beta STD": std_error,                
            },index=[0])

# concat the result in the loop like so
total_df = pd.concat([total_df, results], axis=0)

回复收藏 0 原文

~没有更多了~