Python(Sklearn)train_test_split:选择要训练的数据以及要测试的数据
我想使用Sklearn的Train_test_split手动将数据分为火车和测试类别。具体来说,在我的.CSV文件中,我想使用所有数据行,直到最后一行进行训练,然后进行最后一行进行测试。
The reason I'm doing this is because I need to launch a machine learning model but am incredibly short on time. I thought the best way would be to use predictions rather than deploying it using IBM Watson. I don't need it to be live.My code so far looks like this:df=pd.read_csv('Book5.csv', names=['Amiability', 'Email'])
from sklearn.model_selection import train_test_split
df_x = df['Amiability']
df_y = df['Email']
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.2, random_state=4)
然后,
len(df)
生产
331
我要用0-330行训练,然后用第331行进行测试。我该怎么做?
I want to use sklearn's train_test_split to manually split data into train and test categories. Specifically, in my .csv file, I want to use all the rows of data until the last row to train, and the last row to test.
The reason I'm doing this is because I need to launch a machine learning model but am incredibly short on time. I thought the best way would be to use predictions rather than deploying it using IBM Watson. I don't need it to be live.
My code so far looks like this:
df=pd.read_csv('Book5.csv', names=['Amiability', 'Email'])
from sklearn.model_selection import train_test_split
df_x = df['Amiability']
df_y = df['Email']
x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.2, random_state=4)
Then,
len(df)
Produces
331
I want to train with rows 0-330, and test with row 331. How can I do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您绝对不需要测试行是最后一行,则应该能够做到:
当
test_size =
是一个整数时,它指定了测试集的绝对样本行数。If you don't absolutely need the test row to be the last row you should be able to do:
When
test_size=
is an integer it specifies the absolute number of sample rows for the test set.