如何训练 Flux.jl 学习以某些初始“种子”为条件的序列？

发布于 2025-01-17 05:48:31 字数 5861 浏览 0 评论 0原文

我正在尝试编写一个给定初始“种子”序列的 RNN 模型，它会重现序列的延续。在上面的代码中，虚拟序列是作为这些初始种子点的函数生成的，并且尝试了 RNN 方法，但是当我绘制生成的序列时，它们与“真实”序列的连接非常糟糕，并且我的模型最多无条件地学习序列到种子（即无条件序列的期望值）。

设置环境...

# Setting the environment...
cd(@__DIR__)    
using Pkg      
Pkg.activate(".")  
# Pkg.add(["Plots","Flux"])
# Pkg.resolve()   
# Pkg.instantiate()
using Random
Random.seed!(123)
using LinearAlgebra, Plots, Flux

生成模拟数据

这个想法是建立一个取决于前 5 个值的序列。因此，前 5 个值是随机的，但序列的其余部分确定性地取决于这前 5 个值以及在了解前 5 个部分的情况下重新创建序列的第二部分的目标。

nSeeds    = 5
seqLength = 5
nTrains   = 1000  
nVal      = 100
nTot = nTrains+nVal
makeSeeds(nSeeds) = 2 .* (rand(nSeeds) .- 0.5) # [-1,+1]
function makeSequence(seeds,seqLength)
  seq = Vector{Float32}(undef,seqLength+nSeeds) # Flux Works with Float32 for performance reasons
  [seq[i] = seeds[i] for i in 1:nSeeds]
  for i in nSeeds+1:(seqLength+nSeeds)
     seq[i] = seq[i-1] + (seeds[4]*0.5) # the only seed that matters is the 4th. Let's see if the RNN learn it !
  end
  return seq
  return seq[nSeeds+1:end]
end

x0   = [makeSeeds(nSeeds) for i in 1:nTot]
seqs = makeSequence.(x0,seqLength)
seqs_vectors = [[[e] for e in seq] for seq in seqs]
y    = [s[2:end] for s in seqs_vectors] # y here is the value of the sequence itself at next step

xtrain = seqs_vectors[1:nTrains]
xval   = seqs_vectors[nTrains+1:end]
ytrain = y[1:nTrains]
yval   = y[nTrains+1:end]

# Flux wants a vector of sequences of individual items, when these in turns are vectors
allData   = xtrain;
aSequence = allData[1]
anElement = aSequence[1]

一些实用函数

function predictSequence(m,seeds,seqLength)
    seq = Vector{Vector{Float32}}(undef,seqLength+length(seeds)-1)
    Flux.reset!(m) # Reset the state (not the weigtht!)
    [seq[i] = [convert(Float32, seeds[i])] for i in 1:nSeeds]
    [seq[i] = m(seq[i-1]) for i in nSeeds+1:nSeeds+seqLength-1]
    [s[1] for s in seq]
end

function myloss(x, y)
    Flux.reset!(m)                 # Reset the state (not the weigtht!)
    [m(x[i]) for i in 1:nSeeds-1]  # Ignores the output but updates the hidden states
    # y_i is x_(i+1), i.e. next element
    sum(Flux.mse(m(xi), yi) for (xi, yi) in zip(x[nSeeds:(end-1)], y[nSeeds:end]))
end
"""
   batchSequences(x,batchSize)

Transform a vector of sequences of individual elements represented as feature vectors to a vector of sequences of elements represented as features ×  batched record matrices
"""
function batchSequences(x,batchSize)
    x = copy(xtrain)
    batchSize = 3
    nRecords  = length(x)
    nItems    = length(x[1])
    nDims     = size(x[1][1],1) 
    nBatches  = Int(floor(nRecords/batchSize))

    emptyBatchedElement = Matrix{Float32}(undef,nDims,batchSize)
    emptySeq = [similar(emptyBatchedElement) for i in 1:nItems]
    outx = [similar(emptySeq) for i in 1:nBatches]
    for b in 1:nBatches
        xmin = (b-1)*batchSize + 1
        xmax = b*batchSize
        for e in 1:nItems
            outx[b][e] = hcat([x[i][e][:,1] for i in xmin:xmax]... )
        end
    end  
    return outx
end

定义模型

m   = Chain(Dense(1,3),LSTM(3, 3), Dense(3, 5,relu),Dense(5,1))
ps  = params(m)
opt = Flux.ADAM()

绘制随机序列及其来自未训练模型的预测。

seq1True = makeSequence(x0[1],seqLength)
seq1Est0 = predictSequence(m,x0[1],seqLength)
plot(seq1True)
plot!(seq1Est0)

实际训练

trainMSE  = Float64[]
valMSE    = Float64[]
epochs    = 20 
batchSize = 16
for e in 1:epochs
    print("Epoch $e ")
    # Shuffling at each epoch
    ids = shuffle(1:length(xtrain))
    x0e      = x0[ids]
    xtraine  = xtrain[ids]
    ytraine  = ytrain[ids]

    xtraine =batchSequences(xtraine,batchSize)
    ytraine =batchSequences(ytraine,batchSize)
    trainxy = zip(xtraine,ytraine)

    # Actual training
    Flux.train!(myloss, ps, trainxy, opt)
    # Making prediction on the trained model and computing accuracies
    global trainMSE, valMSE
    ŷtrain  = [predictSequence(m,x0[i],seqLength) for i in 1:nTrains]
    ŷval    = [predictSequence(m,x0[i],seqLength) for i in (nTrains+1):nTot]
    ytrain  = [makeSequence(x0[i],seqLength) for i in  1:nTrains]
    yval    = [makeSequence(x0[i],seqLength) for i in  (nTrains+1):nTot]

    trainmse =  sum(norm(ŷtrain[i][nSeeds+1:end] - ytrain[i][nSeeds+1:end-1])^2 for i in 1:nTrains)/nTrains
    valmse   =  sum(norm(ŷval[i][nSeeds+1:end] - yval[i][nSeeds+1:end-1])^2 for i in 1:nVal)/nVal
    push!(trainMSE,trainmse)
    push!(valMSE,valmse)
    println("MEan Sq Error: $trainmse - $valmse")
end

绘制一些随机序列

for i = rand(1:nTot,5)
    trueseq = makeSequence(x0[i],seqLength)
    estseq  = predictSequence(m,x0[i],seqLength)
    seqPlot = plot(trueseq[1:end-1],label="true", title = "Seq $i")
    plot!(seqPlot, estseq, label="est")
    display(seqPlot)
end

绘制误差

奇怪的是，验证误差总是低于训练误差...

plot(trainMSE,label="Train MSE")
plot!(valMSE,label="Validation MSE")

误差根据参数而变化，但是总是坚持一些局部最小值，通常无条件地取序列的期望值（即水平线）：

一些序列为 true /predicted 看起来像：

（请注意，估算值不会改变。有时我可以改变它，但我的输出总是与预期序列相距甚远）

我尝试过其他序列结构，其中值不依赖于“固定”位置（例如 seq[i] = seq[i- 1] + 0.2*seq[i-2] ）但我总是得到相同的结果..梯度可能下降 10 倍，但实际上估计保持“恒定”，独立于初始种子。

原文

I am trying to write a RNN model that given an initial "seed" sequence, it reproduces the continuation of the sequence.
In the code above dummy sequences are generated as function of these initial seed points and a RNN approach is attempted, but when I plot the generated sequences, they are very badly connected with the "true" ones and my model at best learns a sequence unconditional to the seed (i.e. the expected value of the unconditional sequence).

Setting the environment...

# Setting the environment...
cd(@__DIR__)    
using Pkg      
Pkg.activate(".")  
# Pkg.add(["Plots","Flux"])
# Pkg.resolve()   
# Pkg.instantiate()
using Random
Random.seed!(123)
using LinearAlgebra, Plots, Flux

Generating simulated data

The idea is to have a sequence that depends on the first 5 values. So the first 5 values are random, but the rest of the sequence depends deterministically to these first 5 values and the objective it to recreate this second part of the sequence knowing the first 5 parts.

nSeeds    = 5
seqLength = 5
nTrains   = 1000  
nVal      = 100
nTot = nTrains+nVal
makeSeeds(nSeeds) = 2 .* (rand(nSeeds) .- 0.5) # [-1,+1]
function makeSequence(seeds,seqLength)
  seq = Vector{Float32}(undef,seqLength+nSeeds) # Flux Works with Float32 for performance reasons
  [seq[i] = seeds[i] for i in 1:nSeeds]
  for i in nSeeds+1:(seqLength+nSeeds)
     seq[i] = seq[i-1] + (seeds[4]*0.5) # the only seed that matters is the 4th. Let's see if the RNN learn it !
  end
  return seq
  return seq[nSeeds+1:end]
end

x0   = [makeSeeds(nSeeds) for i in 1:nTot]
seqs = makeSequence.(x0,seqLength)
seqs_vectors = [[[e] for e in seq] for seq in seqs]
y    = [s[2:end] for s in seqs_vectors] # y here is the value of the sequence itself at next step

xtrain = seqs_vectors[1:nTrains]
xval   = seqs_vectors[nTrains+1:end]
ytrain = y[1:nTrains]
yval   = y[nTrains+1:end]

# Flux wants a vector of sequences of individual items, when these in turns are vectors
allData   = xtrain;
aSequence = allData[1]
anElement = aSequence[1]

Some utility functions

function predictSequence(m,seeds,seqLength)
    seq = Vector{Vector{Float32}}(undef,seqLength+length(seeds)-1)
    Flux.reset!(m) # Reset the state (not the weigtht!)
    [seq[i] = [convert(Float32, seeds[i])] for i in 1:nSeeds]
    [seq[i] = m(seq[i-1]) for i in nSeeds+1:nSeeds+seqLength-1]
    [s[1] for s in seq]
end

function myloss(x, y)
    Flux.reset!(m)                 # Reset the state (not the weigtht!)
    [m(x[i]) for i in 1:nSeeds-1]  # Ignores the output but updates the hidden states
    # y_i is x_(i+1), i.e. next element
    sum(Flux.mse(m(xi), yi) for (xi, yi) in zip(x[nSeeds:(end-1)], y[nSeeds:end]))
end
"""
   batchSequences(x,batchSize)

Transform a vector of sequences of individual elements represented as feature vectors to a vector of sequences of elements represented as features ×  batched record matrices
"""
function batchSequences(x,batchSize)
    x = copy(xtrain)
    batchSize = 3
    nRecords  = length(x)
    nItems    = length(x[1])
    nDims     = size(x[1][1],1) 
    nBatches  = Int(floor(nRecords/batchSize))

    emptyBatchedElement = Matrix{Float32}(undef,nDims,batchSize)
    emptySeq = [similar(emptyBatchedElement) for i in 1:nItems]
    outx = [similar(emptySeq) for i in 1:nBatches]
    for b in 1:nBatches
        xmin = (b-1)*batchSize + 1
        xmax = b*batchSize
        for e in 1:nItems
            outx[b][e] = hcat([x[i][e][:,1] for i in xmin:xmax]... )
        end
    end  
    return outx
end

Defining the model

m   = Chain(Dense(1,3),LSTM(3, 3), Dense(3, 5,relu),Dense(5,1))
ps  = params(m)
opt = Flux.ADAM()

Plotting a random sequence and its prediction from untrained model..

seq1True = makeSequence(x0[1],seqLength)
seq1Est0 = predictSequence(m,x0[1],seqLength)
plot(seq1True)
plot!(seq1Est0)

Actual training

trainMSE  = Float64[]
valMSE    = Float64[]
epochs    = 20 
batchSize = 16
for e in 1:epochs
    print("Epoch $e ")
    # Shuffling at each epoch
    ids = shuffle(1:length(xtrain))
    x0e      = x0[ids]
    xtraine  = xtrain[ids]
    ytraine  = ytrain[ids]

    xtraine =batchSequences(xtraine,batchSize)
    ytraine =batchSequences(ytraine,batchSize)
    trainxy = zip(xtraine,ytraine)

    # Actual training
    Flux.train!(myloss, ps, trainxy, opt)
    # Making prediction on the trained model and computing accuracies
    global trainMSE, valMSE
    ŷtrain  = [predictSequence(m,x0[i],seqLength) for i in 1:nTrains]
    ŷval    = [predictSequence(m,x0[i],seqLength) for i in (nTrains+1):nTot]
    ytrain  = [makeSequence(x0[i],seqLength) for i in  1:nTrains]
    yval    = [makeSequence(x0[i],seqLength) for i in  (nTrains+1):nTot]

    trainmse =  sum(norm(ŷtrain[i][nSeeds+1:end] - ytrain[i][nSeeds+1:end-1])^2 for i in 1:nTrains)/nTrains
    valmse   =  sum(norm(ŷval[i][nSeeds+1:end] - yval[i][nSeeds+1:end-1])^2 for i in 1:nVal)/nVal
    push!(trainMSE,trainmse)
    push!(valMSE,valmse)
    println("MEan Sq Error: $trainmse - $valmse")
end

Plotting some random sequences

for i = rand(1:nTot,5)
    trueseq = makeSequence(x0[i],seqLength)
    estseq  = predictSequence(m,x0[i],seqLength)
    seqPlot = plot(trueseq[1:end-1],label="true", title = "Seq $i")
    plot!(seqPlot, estseq, label="est")
    display(seqPlot)
end

Plotting the error

Strange, the validation error is always lower than the training error...

plot(trainMSE,label="Train MSE")
plot!(valMSE,label="Validation MSE")

The error changes depending of the parameter but got always stuck to some local minima, typically taking the expected value of the sequence unconditionally (i.e. horizontal line):

And some sequence true/predicted looks like:

(note that the estimate doesn't change. Sometimes I can make it change, but I always have outputs very far from the intended sequence)

I have tried other sequence structures, where the value doesn't depend on a "fixed" position (e.g. seq[i] = seq[i-1] + 0.2*seq[i-2] ) but I always got the same result.. the gradient may descend of a factor 10, but in practice the estimation remain "constant", independent from the initial seeds.

分享到QQ

分享到微博