如何在Pytorch中进行火车测试分开
使用Pytorch编码字符串值的最佳方法是什么?
df_train.head():
country league home_team away_team home_odds draw_odds away_odds home_score away_score dow month
0 Brazil Copa do Nordeste Sport Recife Imperatriz 1.36 4.31 7.66 2 2 4 2
1 Brazil Copa do Nordeste ABC America RN 2.62 3.30 2.48 2 1 6 2
2 Brazil Copa do Nordeste Frei Paulistano Nautico 5.19 3.58 1.62 0 2 6 2
3 Brazil Copa do Nordeste Botafogo PB Confianca 2.06 3.16 3.50 1 1 6 2
4 Brazil Copa do Nordeste Fortaleza Ceara 2.19 2.98 3.38 1 1 6 2
df_test.shape:
(76544, 11)
df_test.head()
country league home_team away_team home_odds draw_odds away_odds home_score away_score dow month
0 World Club Friendly Westerlo Gent 2.93 3.47 2.19 NaN NaN 4 6
1 Malaysia Super League Johor DT Selangor 1.27 5.59 8.26 NaN NaN 4 6
2 Argentina Reserve League Lanus 2 River Plate 2 2.54 3.12 2.65 NaN NaN 4 6
3 Asia AFC Cup Bali United Kedah 1.58 4.08 4.93 NaN NaN 4 6
4 Ethiopia Premier League Defence Force Adama City 2.93 2.16 3.38 NaN NaN 4 6
df_test.shape:
(599, 11)
我使用Pandas在Sklearn中进行编码:
def encode_features(df_train, df_test):
features = ['country', 'league', 'home_team', 'away_team']
df_combined = pd.concat([df_train[features], df_test[features]])
for feature in features:
le = preprocessing.LabelEncoder()
le = le.fit(df_combined[feature])
df_train[feature] = le.transform(df_train[feature])
df_test[feature] = le.transform(df_test[feature])
return df_train, df_test
df_train, df_test = encode_features(df_train, df_test)
What is the best way to encode string values using pytorch?
df_train.head():
country league home_team away_team home_odds draw_odds away_odds home_score away_score dow month
0 Brazil Copa do Nordeste Sport Recife Imperatriz 1.36 4.31 7.66 2 2 4 2
1 Brazil Copa do Nordeste ABC America RN 2.62 3.30 2.48 2 1 6 2
2 Brazil Copa do Nordeste Frei Paulistano Nautico 5.19 3.58 1.62 0 2 6 2
3 Brazil Copa do Nordeste Botafogo PB Confianca 2.06 3.16 3.50 1 1 6 2
4 Brazil Copa do Nordeste Fortaleza Ceara 2.19 2.98 3.38 1 1 6 2
df_test.shape:
(76544, 11)
df_test.head()
country league home_team away_team home_odds draw_odds away_odds home_score away_score dow month
0 World Club Friendly Westerlo Gent 2.93 3.47 2.19 NaN NaN 4 6
1 Malaysia Super League Johor DT Selangor 1.27 5.59 8.26 NaN NaN 4 6
2 Argentina Reserve League Lanus 2 River Plate 2 2.54 3.12 2.65 NaN NaN 4 6
3 Asia AFC Cup Bali United Kedah 1.58 4.08 4.93 NaN NaN 4 6
4 Ethiopia Premier League Defence Force Adama City 2.93 2.16 3.38 NaN NaN 4 6
df_test.shape:
(599, 11)
I perform encoding in sklearn using pandas as:
def encode_features(df_train, df_test):
features = ['country', 'league', 'home_team', 'away_team']
df_combined = pd.concat([df_train[features], df_test[features]])
for feature in features:
le = preprocessing.LabelEncoder()
le = le.fit(df_combined[feature])
df_train[feature] = le.transform(df_train[feature])
df_test[feature] = le.transform(df_test[feature])
return df_train, df_test
df_train, df_test = encode_features(df_train, df_test)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为了用字符串编码这四列,您可以使用标签编码器或一个热编码器。这是使用标签编码器的情况的参考类。
我认为这可能对您的情况有所帮助。
For encoding those four column with strings, you can use label encoder or one hot encoders. Here is the reference class for your case with label encoder.
I assume this may help for your case.