Python(Sklearn)train_test_split:选择要训练的数据以及要测试的数据

发布于 2025-02-11 07:33:47 字数 782 浏览 1 评论 0原文

我想使用Sklearn的Train_test_split手动将数据分为火车和测试类别。具体来说,在我的.CSV文件中,我想使用所有数据行,直到最后一行进行训练,然后进行最后一行进行测试。

The reason I'm doing this is because I need to launch a machine learning model but am incredibly short on time. I thought the best way would be to use predictions rather than deploying it using IBM Watson. I don't need it to be live.

My code so far looks like this:

df=pd.read_csv('Book5.csv', names=['Amiability', 'Email'])

from sklearn.model_selection import train_test_split

df_x = df['Amiability']
df_y = df['Email']

x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.2, random_state=4) 

然后,

len(df)

生产

331

我要用0-330行训练,然后用第331行进行测试。我该怎么做?

I want to use sklearn's train_test_split to manually split data into train and test categories. Specifically, in my .csv file, I want to use all the rows of data until the last row to train, and the last row to test.

The reason I'm doing this is because I need to launch a machine learning model but am incredibly short on time. I thought the best way would be to use predictions rather than deploying it using IBM Watson. I don't need it to be live.

My code so far looks like this:

df=pd.read_csv('Book5.csv', names=['Amiability', 'Email'])

from sklearn.model_selection import train_test_split

df_x = df['Amiability']
df_y = df['Email']

x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=0.2, random_state=4) 

Then,

len(df)

Produces

331

I want to train with rows 0-330, and test with row 331. How can I do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ヅ她的身影、若隐若现 2025-02-18 07:33:47

如果您绝对不需要测试行是最后一行,则应该能够做到:

x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=1, random_state=4)

test_size =是一个整数时,它指定了测试集的绝对样本行数。

If you don't absolutely need the test row to be the last row you should be able to do:

x_train, x_test, y_train, y_test = train_test_split(df_x, df_y, test_size=1, random_state=4)

When test_size= is an integer it specifies the absolute number of sample rows for the test set.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文