验证数据可以在模型中使用。

发布于 2025-02-13 02:45:11 字数 884 浏览 0 评论 0原文

我正在尝试建立LSTM型号以预测股票价格。我已经将数据分为培训和测试。我正在使用model.fit中的测试数据作为validation_data。之后,我将测试数据传递到model.predict()并生成预测。

我想知道,如果我使用相同的数据来生成预测,是否会在model.fit()中使用测试数据?

我应该将原始数据分为3组:培训,验证和测试吗?验证数据将在model.fit()中使用,而测试数据将在model.predict()中使用。

示例代码:

model_lstm = Sequential()
model_lstm.add(LSTM(50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(LSTM(units=50))
model_lstm.add(Dense(units=1, activation='relu'))
model_lstm.compile(loss = 'mse', optimizer = 'adam')
model_lstm.summary()

history_lstm = model_lstm.fit(X_train, y_train, validation_data=(X_test, y_test), epochs = 10, batch_size=32, shuffle=False)

I am trying to build an LSTM model to forecast stock prices. I have split the data into training and test. I'm using the test data inside model.fit as validation_data. After that I pass the test data to model.predict() and generate the forecasts.

I am wondering, if I use the test data in model.fit(), would overfitting occur given that I use the same set of data to generate the forecasts?

Should I split the raw data into 3 sets instead: training, validation and test? The validation data would be used in model.fit() whilst the test data would be used in model.predict().

Sample code:

model_lstm = Sequential()
model_lstm.add(LSTM(50, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(LSTM(units=50, return_sequences=True))
model_lstm.add(LSTM(units=50))
model_lstm.add(Dense(units=1, activation='relu'))
model_lstm.compile(loss = 'mse', optimizer = 'adam')
model_lstm.summary()

history_lstm = model_lstm.fit(X_train, y_train, validation_data=(X_test, y_test), epochs = 10, batch_size=32, shuffle=False)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

愚人国度 2025-02-20 02:45:11

通常,您将数据分为3组:

  1. 火车集:用于训练模型
  2. 验证集:用于频繁评估模型,允许微调超参数。不得用来训练,因为评估必须是最无偏见的。
  3. 测试集:用于评估模型的最终集。

如参数名称所示(validation_set),您应该在此处放置验证设置。
如您所想,允许模型尝试“验证”测试集中的超参数可能导致过度拟合。

至于比率,模型的超参数数量越大,验证集应越大(另外,请查看“交叉验证”:如果火车集太小以至于您无法能够能够能够有所帮助在不影响性能的情况下占用其中的很大一部分)

Usually, you would split the data into 3 sets:

  1. train set: used to train the model
  2. validation set: used for frequent evaluation of the model, allow to fine-tune hyper-parameters. MUSTN'T be used to train, as the evaluation must be the most unbiased possible.
  3. test set: final set used for the evaluation of the model.

As indicated by the name of the argument (validation_set) you are supposed to put the validation set here.
As you thought, allowing the model to try and "validate" the hyper-parameters on the test set could lead to overfitting.

As for the ratio, the greater the number of hyper-parameters of your model, the bigger the validation set should be (also, look into "cross validation": this will help if the train set is too small for you to be able to take a big part of it for the validation set without impacting the performances)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文