在加强学习中需要帮助以奖励功能

发布于 2025-02-10 09:23:56 字数 1729 浏览 2 评论 0原文

我已经创建了一个用于人造自定义金融资产进行交易的RL(完成代码)。这是我的数据框(环境)由“近距离”价格和“音量”制成:

loses = []
volumes = []
for i in range(0,16):
    for inc in range(0,30):
        closes.append(1 + 0.00005 * inc)
        volumes.append(2 + 0.00008 * inc)
    for dec in range(0,30):
        closes.append(1.00145 - 0.00005 * dec)
        volumes.append(2.00240 - 0.00008 * dec)

raw_df = pd.DataFrame(zip(closes, volumes), columns=['close','volume'])

我正在使用差异化(df -df -shift(1))使我的数据静止不动。有三个操作:销售,购买并持有。

这是每个步骤之后的返回的观察结果:“关闭”,“卷”,trade_length,total_episode_profit,current_profit,current_action(交易,观看)

的开放和关闭贸易成本等于1,并且要观看0.5罚款市场和无所事事,持有奖励等于接近[-1] - 关闭[-2],卖出奖励等于总利润或交易头寸损失。

这是我的NN结构:

model = Sequential()
        model.add(Dense(10, activation='tanh', input_shape=(env.df_ep.shape[1] + 4,)))
        model.add(Dropout(0.2))
        model.add(Dense(8))
        model.add(Dropout(0.2))
        model.add(Dense(env.ACTION_SPACE_SIZE, activation='linear'))
        model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.001), metrics=['accuracy'])

问题是在许多情节(大约6000个)的RL停止学习之后,它首先就开设了一笔交易,直到最后!但这确实是一个简单的财务资产和一个简单的环境,它不是真正的财务资产,我认为它应该很容易学习。我想问题是我的奖励功能。

这是一些情节的照片:

”在此处输入图像描述“

“在此处输入图像说明”

I've created a RL to trade on an artificial custom financial asset (Complete Code). This is my dataframe (environment) made of 'Close' price and 'Volume':

loses = []
volumes = []
for i in range(0,16):
    for inc in range(0,30):
        closes.append(1 + 0.00005 * inc)
        volumes.append(2 + 0.00008 * inc)
    for dec in range(0,30):
        closes.append(1.00145 - 0.00005 * dec)
        volumes.append(2.00240 - 0.00008 * dec)

raw_df = pd.DataFrame(zip(closes, volumes), columns=['close','volume'])

I'm making my data stationary using differentiation (df - df.shift(1)) There are three actions: Sell, Buy and Hold.

And this is the returned observation after each step: 'close', 'volume', trade_length, total_episode_profit, current_profit, current_action (trading, watching)

There is an open and close trade cost equal to 1, and there is a 0.5 penalty to watch market and do nothing, holding reward is equal to close[-1] - close[-2] and sell reward is equal to total profit or loss of trading position.

And here is my NN structure:

model = Sequential()
        model.add(Dense(10, activation='tanh', input_shape=(env.df_ep.shape[1] + 4,)))
        model.add(Dropout(0.2))
        model.add(Dense(8))
        model.add(Dropout(0.2))
        model.add(Dense(env.ACTION_SPACE_SIZE, activation='linear'))
        model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.001), metrics=['accuracy'])

The problem is after lots of episodes (about 6000) RL stops to learn and its just open a trade at the first and hold it till the end! But this is really a simple financial asset and a simple environment, it's not a real financial asset and I think it should learn it easily. I guess that the problem is with my reward function.

Here are some photos of episodes:

enter image description here

enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文