如何提高 LSTM 模型 (Keras) 的准确性?

发布于 2025-01-10 08:57:57 字数 9433 浏览 4 评论 0原文

我目前正在尝试创建一个 LSTM 网络,该网络从许多 MIDI 文件(一种代表音符的数字格式)中获取数据,并预测音乐序列中的下一个音符是什么。我使用以下函数将 MIDI 数据标记为更简单的整数时间序列格式:

def tokenize_stream(list):

    tokenized_array = []
    current_token = 0

    for x in list:
        string_version = ' '.join(x)
        if string_version in tokenizer_map:
            tokenized_array.append(tokenizer_map.get(string_version))
        else:
            tokenizer_map.update({string_version: current_token})
            tokenized_array.append(current_token)
            current_token += 1

    return tokenized_array

def data_to_time_series(data, window_size):

    numpy_array = np.array(data)

    X = []
    Y = []

    for i in range(len(numpy_array) - window_size):
        row = [[a] for a in numpy_array[i: i + window_size]]
        X.append(row)
        label = numpy_array[i + window_size]
        Y.append(label)
    return np.array(X), np.array(Y)

这些函数将音符名称转换为如下所示的标记:

Raw Data:[1,3,2,2,2,1,4]

然后将它们转换为如下所示的时间序列格式:

Data:[1,3,2,2,2,1] -> Label:[4]

下面我们可以看到以下的实际数据 : 2 个值得注意的 MIDI 文件:

Input Data (X):
[
[[1][3][1]...[6][6][6]]

[[3][1][0]...[6][6][1]]

[[1][0][1]...[6][1][6]]

...

[[1][2][2]...[8][1][2]]

[[2][2][1]...[1][2][8]]

[[2][1][0]...[2][8][3]]
]

Labels (Y):
[1 6 1 1 6 6 1 6 1 1 3 1 3 0 3 4 3 4 1 6 6 1 6 1 2 1 3 1 3 0 3 4 3 4 5 1 6
1 3 1 3 0 6 3 3 2 3 3 3 4 5 1 3 0 6 3 3 2 3 1 3 3 0 0 5 3 2 5 5 3 1 5 5 5
9 7 0 5 3 2 5 5 0 3 6 6 6 1 6 1 1 3 1 3 0 3 4 3 4 1 6 1 6 3 2 6 6 2 6 3 2
6 8 3 3 3 3 2 8 3 3 8 3 2 8 3 1 1 0 3 2 6 1 2 2 1 0 8 4 8 3 8 4 1 1 8 1 2
8 3 6 3 1 1 1 1 1 5 1 5 4 3 3 5 1 2 5 6 6 2 5 3 1 0 0 3 1 1 1 1 1 5 1 5 4
3 3 2 5 3 5 0 3 6 3 4 6 1 1 5 1 3 1 1 1 1 1 5 1 4 5 4 3 3 5 1 2 5 6 6 2 5
3 1 0 0 3 1 1 1 1 1 5 1 4 5 4 3 3 2 5 3 5 0 3 6 3 4 6 1 1 5 1 6 3 2 6 6 2
6 3 2 6 8 3 3 3 3 2 8 3 3 8 3 2 8 3 1 1 0 3 2 6 1 2 2 1 0 8 4 8 3 8 4 1 1
8 1 2 8 3 6]

通过此函数收集数据并以正确的格式放置:

def get_data(path, look_back, train_size_v, number_of_midi_files):


    files = []
    count = 0
    countMax = number_of_midi_files

    for i in os.listdir(path):
        if count == countMax:
             break
        if i.endswith(".mid"):
            files.append(i)
            count += 1

    random.shuffle(files)


    # Add the information from each note in the MIDI files to an array
    notes_array = np.array([read_midi(path + i) for i in files])

    # converting 2D array into 1D array
    notes = [element for note_ in notes_array for element in note_]

    # Tokenize the list of notes
    tokens = tokenize_stream(notes)

    unique_notes = list(set(tokens))
    print("Unique Notes: " + str(len(unique_notes)))

    # Transform the data into time series format
    X, Y = data_to_time_series(tokens, look_back)

    n_vocab = len(set(tokens))

    X_train, X_remainder, Y_train, Y_remainder = train_test_split(X, Y, train_size = train_size_v)
    X_val, X_test, Y_val, Y_test = train_test_split(X_remainder,Y_remainder, test_size=0.5)

    Y_train = to_categorical(Y_train, n_vocab)
    Y_val = to_categorical(Y_val, n_vocab)
    Y_test = to_categorical(Y_test, n_vocab)


    return n_vocab, X, Y, X_train, Y_train, X_val, Y_val, X_test, Y_test

然后使用数据的方式是 LSTM 通过查看某个音符的标签来了解序列中的下一个音符应该是什么。一系列整数,如图所示 多于。

然后,这些数据被用来尝试训练一个非常简单的 LSTM 模型,但我发现我对模型的准确性没有任何运气。这是我正在使用的模型:

def build_model(model_input, model_labels, n_vocab, learning_rate):
   model = Sequential()
   model.add(LSTM(10, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
   model.add(Dense(n_vocab))
   model.add(Activation('softmax'))
   
   model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])
   
   model.summary()
   return model

我使用单个 LSTM 层,然后使用 softmax 激活层来输出每个可能音符的概率。在本例中,只有 12 个可能的音符,因此 n_vocab 为 12。

然后我按如下方式训练模型:

def train_model(model_input, model_labels, val_input, val_labels, epochs_v, look_back, n_vocab, learning_rate): 
    filepath = "music_model_2/"
    
    earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
    mcp_save = ModelCheckpoint(filepath, save_best_only=True, monitor='val_loss', mode='min')
    reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=7, verbose=1, epsilon=1e-4, mode='min')
    
    model = build_model(model_input, model_labels, n_vocab, learning_rate)
    
    history = model.fit(model_input, model_labels, validation_data = (val_input, val_labels), batch_size=128, epochs=epochs_v, callbacks=[earlyStopping, mcp_save, reduce_lr_loss]).history
    
    return history  

最后,在我的主函数中,我按如下方式构建、训练和评估模型:

def main():

    path = 'C_Major_Midi/'
    look_back = 16 # size of lookback for the timeseries data 
    epochs = 40 # Number of eopchs the model runs for
    training_data_split = 0.8 #The percentage split of the training and test data
    number_of_midi_files = 50 # The number of midi files used to create the time series data
    learning_rate = 0.001 # The learning rate of the model
    batch_size = 128 # Batch size used for the model

    n_vocab, X, Y, X_train, Y_train, X_val, Y_val, X_test, Y_test = get_data(path, look_back, training_data_split, number_of_midi_files)

    history = train_model(X_train, Y_train, X_val, Y_val, epochs, look_back, n_vocab, learning_rate)


    model = load_model("music_model_2")


    test_loss, test_acc = model.evaluate(X_test, Y_test)

    print('Test Loss: {}'.format(test_loss))

    print('Test Accuracy: {}'.format(test_acc))

具有超参数的模型的输出如下所示:

    Model: "sequential_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_13 (LSTM)               (None, 10)                480       
_________________________________________________________________
dense_13 (Dense)             (None, 12)                132       
_________________________________________________________________
activation_13 (Activation)   (None, 12)                0         
=================================================================
Total params: 612
Trainable params: 612
Non-trainable params: 0

运行模型并根据测试数据对其进行评估的结果如下。

模型精度

模型损失

这是一组经过调整的参数和模型,表明我所做的调整对提高准确性几乎没有作用。

model = Sequential()
model.add(LSTM(128, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])


look_back = 5
epochs = 100
training_data_split = 0.8
number_of_midi_files = 40
learning_rate = 0.001
batch_size = 50

模型精度2

Model Loss 2

这是我编写的另一个模型,它更复杂并且运行更多时期。它的准确率仍然稳定在略高于 0.3 的水平。

#   First LSTM Layer  
model.add(LSTM(128,input_shape=(model_input.shape[1], model_input.shape[2]),return_sequences=True))
model.add(Dropout(0.3))

#   Second LSTM Layer    
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))

#   First Hidden Layer        
model.add(Dense(256))
model.add(Dropout(0.3))

#   Second Hidden Layer    
model.add(Dense(256))
model.add(Dropout(0.5))

#   Flatten data shape    
model.add(Flatten())

#   Final Output Layer
model.add(Dense(n_vocab))          
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer=adam_v2.Adam(learning_rate=learning_rate, decay=1e-6), metrics=['accuracy'])


path = 'C_Major_Midi/'
look_back = 10
epochs = 200
training_data_split = 0.8
number_of_midi_files = 1000
learning_rate = 0.001
batch_size = 128

模型精度3模型损失3

最终模型显示训练数据和测试数据之间具有良好的一致性,但仍稳定在 0.3。

model = Sequential()
model.add(LSTM(50, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])


look_back = 7
epochs = 50
training_data_split = 0.8
number_of_midi_files = 100
learning_rate = 0.001
batch_size = 128

模型精度4

Model Loss 4

如图所示,我尝试使用更大量的数据、不同的优化器、更多/更少的 LSTM 层以及其中的单位。无论我如何调整,我永远无法获得超过 0.3 的准确度分数。这是我的第一个大型机器学习项目,所以我很可能会犯一些愚蠢的错误,但我希望有经验的人让我知道为什么我的准确性这么快就趋于稳定。

非常感谢xx

I am currently trying to create an LSTM network that takes the data from many MIDI files (a digital format representing musical notes) and predicts what the next note will be in a musical sequence. I have tokenized the MIDI data into a more simple integer time series format using the following functions:

def tokenize_stream(list):

    tokenized_array = []
    current_token = 0

    for x in list:
        string_version = ' '.join(x)
        if string_version in tokenizer_map:
            tokenized_array.append(tokenizer_map.get(string_version))
        else:
            tokenizer_map.update({string_version: current_token})
            tokenized_array.append(current_token)
            current_token += 1

    return tokenized_array

def data_to_time_series(data, window_size):

    numpy_array = np.array(data)

    X = []
    Y = []

    for i in range(len(numpy_array) - window_size):
        row = [[a] for a in numpy_array[i: i + window_size]]
        X.append(row)
        label = numpy_array[i + window_size]
        Y.append(label)
    return np.array(X), np.array(Y)

These functions turn the note names into tokens like this:

Raw Data:[1,3,2,2,2,1,4]

And then turn them into time-series format like this:

Data:[1,3,2,2,2,1] -> Label:[4]

Below we can see the actual data for 2 MIDI files worth of notes:

Input Data (X):
[
[[1][3][1]...[6][6][6]]

[[3][1][0]...[6][6][1]]

[[1][0][1]...[6][1][6]]

...

[[1][2][2]...[8][1][2]]

[[2][2][1]...[1][2][8]]

[[2][1][0]...[2][8][3]]
]

Labels (Y):
[1 6 1 1 6 6 1 6 1 1 3 1 3 0 3 4 3 4 1 6 6 1 6 1 2 1 3 1 3 0 3 4 3 4 5 1 6
1 3 1 3 0 6 3 3 2 3 3 3 4 5 1 3 0 6 3 3 2 3 1 3 3 0 0 5 3 2 5 5 3 1 5 5 5
9 7 0 5 3 2 5 5 0 3 6 6 6 1 6 1 1 3 1 3 0 3 4 3 4 1 6 1 6 3 2 6 6 2 6 3 2
6 8 3 3 3 3 2 8 3 3 8 3 2 8 3 1 1 0 3 2 6 1 2 2 1 0 8 4 8 3 8 4 1 1 8 1 2
8 3 6 3 1 1 1 1 1 5 1 5 4 3 3 5 1 2 5 6 6 2 5 3 1 0 0 3 1 1 1 1 1 5 1 5 4
3 3 2 5 3 5 0 3 6 3 4 6 1 1 5 1 3 1 1 1 1 1 5 1 4 5 4 3 3 5 1 2 5 6 6 2 5
3 1 0 0 3 1 1 1 1 1 5 1 4 5 4 3 3 2 5 3 5 0 3 6 3 4 6 1 1 5 1 6 3 2 6 6 2
6 3 2 6 8 3 3 3 3 2 8 3 3 8 3 2 8 3 1 1 0 3 2 6 1 2 2 1 0 8 4 8 3 8 4 1 1
8 1 2 8 3 6]

This data is collected and put in the correct format with this function:

def get_data(path, look_back, train_size_v, number_of_midi_files):


    files = []
    count = 0
    countMax = number_of_midi_files

    for i in os.listdir(path):
        if count == countMax:
             break
        if i.endswith(".mid"):
            files.append(i)
            count += 1

    random.shuffle(files)


    # Add the information from each note in the MIDI files to an array
    notes_array = np.array([read_midi(path + i) for i in files])

    # converting 2D array into 1D array
    notes = [element for note_ in notes_array for element in note_]

    # Tokenize the list of notes
    tokens = tokenize_stream(notes)

    unique_notes = list(set(tokens))
    print("Unique Notes: " + str(len(unique_notes)))

    # Transform the data into time series format
    X, Y = data_to_time_series(tokens, look_back)

    n_vocab = len(set(tokens))

    X_train, X_remainder, Y_train, Y_remainder = train_test_split(X, Y, train_size = train_size_v)
    X_val, X_test, Y_val, Y_test = train_test_split(X_remainder,Y_remainder, test_size=0.5)

    Y_train = to_categorical(Y_train, n_vocab)
    Y_val = to_categorical(Y_val, n_vocab)
    Y_test = to_categorical(Y_test, n_vocab)


    return n_vocab, X, Y, X_train, Y_train, X_val, Y_val, X_test, Y_test

The way the data is then used is that the LSTM learns what the next note in the sequence should be by seeing what the label is for a series of integers, as seen above.

This data is then being used to try and train a very simple LSTM model but I am not finding that I am having any luck with the accuracy of the model. Here is the model I am using:

def build_model(model_input, model_labels, n_vocab, learning_rate):
   model = Sequential()
   model.add(LSTM(10, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
   model.add(Dense(n_vocab))
   model.add(Activation('softmax'))
   
   model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])
   
   model.summary()
   return model

I am using a single LSTM layer and then a softmax activation layer to output the probabilities of each possible note. In this instance, there are only 12 possible notes so n_vocab is 12.

I then train the model as follows:

def train_model(model_input, model_labels, val_input, val_labels, epochs_v, look_back, n_vocab, learning_rate): 
    filepath = "music_model_2/"
    
    earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
    mcp_save = ModelCheckpoint(filepath, save_best_only=True, monitor='val_loss', mode='min')
    reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=7, verbose=1, epsilon=1e-4, mode='min')
    
    model = build_model(model_input, model_labels, n_vocab, learning_rate)
    
    history = model.fit(model_input, model_labels, validation_data = (val_input, val_labels), batch_size=128, epochs=epochs_v, callbacks=[earlyStopping, mcp_save, reduce_lr_loss]).history
    
    return history  

Finally, in my main function, I am building, training and evaluating the model as follows:

def main():

    path = 'C_Major_Midi/'
    look_back = 16 # size of lookback for the timeseries data 
    epochs = 40 # Number of eopchs the model runs for
    training_data_split = 0.8 #The percentage split of the training and test data
    number_of_midi_files = 50 # The number of midi files used to create the time series data
    learning_rate = 0.001 # The learning rate of the model
    batch_size = 128 # Batch size used for the model

    n_vocab, X, Y, X_train, Y_train, X_val, Y_val, X_test, Y_test = get_data(path, look_back, training_data_split, number_of_midi_files)

    history = train_model(X_train, Y_train, X_val, Y_val, epochs, look_back, n_vocab, learning_rate)


    model = load_model("music_model_2")


    test_loss, test_acc = model.evaluate(X_test, Y_test)

    print('Test Loss: {}'.format(test_loss))

    print('Test Accuracy: {}'.format(test_acc))

The output of the model with the hyperparameters shown here is as follows:

    Model: "sequential_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_13 (LSTM)               (None, 10)                480       
_________________________________________________________________
dense_13 (Dense)             (None, 12)                132       
_________________________________________________________________
activation_13 (Activation)   (None, 12)                0         
=================================================================
Total params: 612
Trainable params: 612
Non-trainable params: 0

The results from running the model and evaluating it on test data are as follows.

Model Accuracy

Model Loss

Here is a tweaked set of parameters and model to show that the tweaks I'm making do little to improve the accuracy.

model = Sequential()
model.add(LSTM(128, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])


look_back = 5
epochs = 100
training_data_split = 0.8
number_of_midi_files = 40
learning_rate = 0.001
batch_size = 50

Model Accuracy 2

Model Loss 2

Here is a further model I wrote that is more complex and run for more epochs. It still plateaus at just over 0.3 accuracy.

#   First LSTM Layer  
model.add(LSTM(128,input_shape=(model_input.shape[1], model_input.shape[2]),return_sequences=True))
model.add(Dropout(0.3))

#   Second LSTM Layer    
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))

#   First Hidden Layer        
model.add(Dense(256))
model.add(Dropout(0.3))

#   Second Hidden Layer    
model.add(Dense(256))
model.add(Dropout(0.5))

#   Flatten data shape    
model.add(Flatten())

#   Final Output Layer
model.add(Dense(n_vocab))          
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer=adam_v2.Adam(learning_rate=learning_rate, decay=1e-6), metrics=['accuracy'])


path = 'C_Major_Midi/'
look_back = 10
epochs = 200
training_data_split = 0.8
number_of_midi_files = 1000
learning_rate = 0.001
batch_size = 128

Model Accuracy 3
Model Loss 3

And a final Model that shows good alignment between train and test data but still plateaus at 0.3.

model = Sequential()
model.add(LSTM(50, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])


look_back = 7
epochs = 50
training_data_split = 0.8
number_of_midi_files = 100
learning_rate = 0.001
batch_size = 128

Model Accuracy 4

Model Loss 4

As shown, I have tried to use larger quantities of data, different optimizers, more/less LSTM layers and units within them. Whatever I tweak I can never achieve an accuracy score of much over 0.3. This is my first big machine learning project so it is very likely I’m making some stupid errors but I would like someone with experience to let me know why my accuracy is plateauing so quickly.

Thanks so so much xx

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文