在加强学习中需要帮助以奖励功能
我已经创建了一个用于人造自定义金融资产进行交易的RL(完成代码)。这是我的数据框(环境)由“近距离”价格和“音量”制成:
loses = []
volumes = []
for i in range(0,16):
for inc in range(0,30):
closes.append(1 + 0.00005 * inc)
volumes.append(2 + 0.00008 * inc)
for dec in range(0,30):
closes.append(1.00145 - 0.00005 * dec)
volumes.append(2.00240 - 0.00008 * dec)
raw_df = pd.DataFrame(zip(closes, volumes), columns=['close','volume'])
我正在使用差异化(df -df -shift(1)
)使我的数据静止不动。有三个操作:销售,购买并持有。
这是每个步骤之后的返回的观察结果:“关闭”,“卷”,trade_length,total_episode_profit,current_profit,current_action(交易,观看)
的开放和关闭贸易成本等于1,并且要观看0.5罚款市场和无所事事,持有奖励等于接近[-1] - 关闭[-2],卖出奖励等于总利润或交易头寸损失。
这是我的NN结构:
model = Sequential()
model.add(Dense(10, activation='tanh', input_shape=(env.df_ep.shape[1] + 4,)))
model.add(Dropout(0.2))
model.add(Dense(8))
model.add(Dropout(0.2))
model.add(Dense(env.ACTION_SPACE_SIZE, activation='linear'))
model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.001), metrics=['accuracy'])
问题是在许多情节(大约6000个)的RL停止学习之后,它首先就开设了一笔交易,直到最后!但这确实是一个简单的财务资产和一个简单的环境,它不是真正的财务资产,我认为它应该很容易学习。我想问题是我的奖励功能。
这是一些情节的照片:
I've created a RL to trade on an artificial custom financial asset (Complete Code). This is my dataframe (environment) made of 'Close' price and 'Volume':
loses = []
volumes = []
for i in range(0,16):
for inc in range(0,30):
closes.append(1 + 0.00005 * inc)
volumes.append(2 + 0.00008 * inc)
for dec in range(0,30):
closes.append(1.00145 - 0.00005 * dec)
volumes.append(2.00240 - 0.00008 * dec)
raw_df = pd.DataFrame(zip(closes, volumes), columns=['close','volume'])
I'm making my data stationary using differentiation (df - df.shift(1)
) There are three actions: Sell, Buy and Hold.
And this is the returned observation after each step: 'close', 'volume', trade_length, total_episode_profit, current_profit, current_action (trading, watching)
There is an open and close trade cost equal to 1, and there is a 0.5 penalty to watch market and do nothing, holding reward is equal to close[-1] - close[-2] and sell reward is equal to total profit or loss of trading position.
And here is my NN structure:
model = Sequential()
model.add(Dense(10, activation='tanh', input_shape=(env.df_ep.shape[1] + 4,)))
model.add(Dropout(0.2))
model.add(Dense(8))
model.add(Dropout(0.2))
model.add(Dense(env.ACTION_SPACE_SIZE, activation='linear'))
model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.001), metrics=['accuracy'])
The problem is after lots of episodes (about 6000) RL stops to learn and its just open a trade at the first and hold it till the end! But this is really a simple financial asset and a simple environment, it's not a real financial asset and I think it should learn it easily. I guess that the problem is with my reward function.
Here are some photos of episodes:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论