如何在Pytorch中进行火车测试分开

发布于 2025-02-10 12:33:02 字数 2213 浏览 2 评论 0原文

使用Pytorch编码字符串值的最佳方法是什么?

df_train.head():
  country            league        home_team   away_team  home_odds  draw_odds  away_odds  home_score  away_score  dow  month
0  Brazil  Copa do Nordeste     Sport Recife  Imperatriz       1.36       4.31       7.66           2           2    4      2
1  Brazil  Copa do Nordeste              ABC  America RN       2.62       3.30       2.48           2           1    6      2
2  Brazil  Copa do Nordeste  Frei Paulistano     Nautico       5.19       3.58       1.62           0           2    6      2
3  Brazil  Copa do Nordeste      Botafogo PB   Confianca       2.06       3.16       3.50           1           1    6      2
4  Brazil  Copa do Nordeste        Fortaleza       Ceara       2.19       2.98       3.38           1           1    6      2

df_test.shape:
(76544, 11)

df_test.head()
     country          league      home_team      away_team  home_odds  draw_odds  away_odds  home_score  away_score  dow  month
0      World   Club Friendly       Westerlo           Gent       2.93       3.47       2.19         NaN         NaN    4      6
1   Malaysia    Super League       Johor DT       Selangor       1.27       5.59       8.26         NaN         NaN    4      6
2  Argentina  Reserve League        Lanus 2  River Plate 2       2.54       3.12       2.65         NaN         NaN    4      6
3       Asia         AFC Cup    Bali United          Kedah       1.58       4.08       4.93         NaN         NaN    4      6
4   Ethiopia  Premier League  Defence Force     Adama City       2.93       2.16       3.38         NaN         NaN    4      6

df_test.shape:
(599, 11)

我使用Pandas在Sklearn中进行编码:

def encode_features(df_train, df_test):
    features = ['country', 'league', 'home_team', 'away_team']
    df_combined = pd.concat([df_train[features], df_test[features]])

    for feature in features:
        le = preprocessing.LabelEncoder()
        le = le.fit(df_combined[feature])
        df_train[feature] = le.transform(df_train[feature])
        df_test[feature] = le.transform(df_test[feature])
    return df_train, df_test


df_train, df_test = encode_features(df_train, df_test)

What is the best way to encode string values using pytorch?

df_train.head():
  country            league        home_team   away_team  home_odds  draw_odds  away_odds  home_score  away_score  dow  month
0  Brazil  Copa do Nordeste     Sport Recife  Imperatriz       1.36       4.31       7.66           2           2    4      2
1  Brazil  Copa do Nordeste              ABC  America RN       2.62       3.30       2.48           2           1    6      2
2  Brazil  Copa do Nordeste  Frei Paulistano     Nautico       5.19       3.58       1.62           0           2    6      2
3  Brazil  Copa do Nordeste      Botafogo PB   Confianca       2.06       3.16       3.50           1           1    6      2
4  Brazil  Copa do Nordeste        Fortaleza       Ceara       2.19       2.98       3.38           1           1    6      2

df_test.shape:
(76544, 11)

df_test.head()
     country          league      home_team      away_team  home_odds  draw_odds  away_odds  home_score  away_score  dow  month
0      World   Club Friendly       Westerlo           Gent       2.93       3.47       2.19         NaN         NaN    4      6
1   Malaysia    Super League       Johor DT       Selangor       1.27       5.59       8.26         NaN         NaN    4      6
2  Argentina  Reserve League        Lanus 2  River Plate 2       2.54       3.12       2.65         NaN         NaN    4      6
3       Asia         AFC Cup    Bali United          Kedah       1.58       4.08       4.93         NaN         NaN    4      6
4   Ethiopia  Premier League  Defence Force     Adama City       2.93       2.16       3.38         NaN         NaN    4      6

df_test.shape:
(599, 11)

I perform encoding in sklearn using pandas as:

def encode_features(df_train, df_test):
    features = ['country', 'league', 'home_team', 'away_team']
    df_combined = pd.concat([df_train[features], df_test[features]])

    for feature in features:
        le = preprocessing.LabelEncoder()
        le = le.fit(df_combined[feature])
        df_train[feature] = le.transform(df_train[feature])
        df_test[feature] = le.transform(df_test[feature])
    return df_train, df_test


df_train, df_test = encode_features(df_train, df_test)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

罗罗贝儿 2025-02-17 12:33:02

为了用字符串编码这四列,您可以使用标签编码器或一个热编码器。这是使用标签编码器的情况的参考类。

import pandas
from sklearn.preprocessing import LabelEncoder

class MultiColumnLabelEncoder:
    def __init__(self,columns = None):
        self.columns = columns # array of column names to encode

    def fit(self,X,y=None):
        return self # not relevant here

    def transform(self,X):
        output = X.copy()
        if self.columns is not None:
            for col in self.columns:
                output[col] = LabelEncoder().fit_transform(output[col])
        else:
            for colname,col in output.iteritems():
                output[colname] = LabelEncoder().fit_transform(col)
        return output

    def fit_transform(self,X,y=None):
        return self.fit(X,y).transform(X)

MultiColumnLabelEncoder(columns = ['country', 'league', 'home_team', 'away_team']).fit_transform(df_train)

我认为这可能对您的情况有所帮助。

For encoding those four column with strings, you can use label encoder or one hot encoders. Here is the reference class for your case with label encoder.

import pandas
from sklearn.preprocessing import LabelEncoder

class MultiColumnLabelEncoder:
    def __init__(self,columns = None):
        self.columns = columns # array of column names to encode

    def fit(self,X,y=None):
        return self # not relevant here

    def transform(self,X):
        output = X.copy()
        if self.columns is not None:
            for col in self.columns:
                output[col] = LabelEncoder().fit_transform(output[col])
        else:
            for colname,col in output.iteritems():
                output[colname] = LabelEncoder().fit_transform(col)
        return output

    def fit_transform(self,X,y=None):
        return self.fit(X,y).transform(X)

MultiColumnLabelEncoder(columns = ['country', 'league', 'home_team', 'away_team']).fit_transform(df_train)

I assume this may help for your case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文