如何提高 LSTM 模型 (Keras) 的准确性?
我目前正在尝试创建一个 LSTM 网络,该网络从许多 MIDI 文件(一种代表音符的数字格式)中获取数据,并预测音乐序列中的下一个音符是什么。我使用以下函数将 MIDI 数据标记为更简单的整数时间序列格式:
def tokenize_stream(list):
tokenized_array = []
current_token = 0
for x in list:
string_version = ' '.join(x)
if string_version in tokenizer_map:
tokenized_array.append(tokenizer_map.get(string_version))
else:
tokenizer_map.update({string_version: current_token})
tokenized_array.append(current_token)
current_token += 1
return tokenized_array
def data_to_time_series(data, window_size):
numpy_array = np.array(data)
X = []
Y = []
for i in range(len(numpy_array) - window_size):
row = [[a] for a in numpy_array[i: i + window_size]]
X.append(row)
label = numpy_array[i + window_size]
Y.append(label)
return np.array(X), np.array(Y)
这些函数将音符名称转换为如下所示的标记:
Raw Data:[1,3,2,2,2,1,4]
然后将它们转换为如下所示的时间序列格式:
Data:[1,3,2,2,2,1] -> Label:[4]
下面我们可以看到以下的实际数据 : 2 个值得注意的 MIDI 文件:
Input Data (X):
[
[[1][3][1]...[6][6][6]]
[[3][1][0]...[6][6][1]]
[[1][0][1]...[6][1][6]]
...
[[1][2][2]...[8][1][2]]
[[2][2][1]...[1][2][8]]
[[2][1][0]...[2][8][3]]
]
Labels (Y):
[1 6 1 1 6 6 1 6 1 1 3 1 3 0 3 4 3 4 1 6 6 1 6 1 2 1 3 1 3 0 3 4 3 4 5 1 6
1 3 1 3 0 6 3 3 2 3 3 3 4 5 1 3 0 6 3 3 2 3 1 3 3 0 0 5 3 2 5 5 3 1 5 5 5
9 7 0 5 3 2 5 5 0 3 6 6 6 1 6 1 1 3 1 3 0 3 4 3 4 1 6 1 6 3 2 6 6 2 6 3 2
6 8 3 3 3 3 2 8 3 3 8 3 2 8 3 1 1 0 3 2 6 1 2 2 1 0 8 4 8 3 8 4 1 1 8 1 2
8 3 6 3 1 1 1 1 1 5 1 5 4 3 3 5 1 2 5 6 6 2 5 3 1 0 0 3 1 1 1 1 1 5 1 5 4
3 3 2 5 3 5 0 3 6 3 4 6 1 1 5 1 3 1 1 1 1 1 5 1 4 5 4 3 3 5 1 2 5 6 6 2 5
3 1 0 0 3 1 1 1 1 1 5 1 4 5 4 3 3 2 5 3 5 0 3 6 3 4 6 1 1 5 1 6 3 2 6 6 2
6 3 2 6 8 3 3 3 3 2 8 3 3 8 3 2 8 3 1 1 0 3 2 6 1 2 2 1 0 8 4 8 3 8 4 1 1
8 1 2 8 3 6]
通过此函数收集数据并以正确的格式放置:
def get_data(path, look_back, train_size_v, number_of_midi_files):
files = []
count = 0
countMax = number_of_midi_files
for i in os.listdir(path):
if count == countMax:
break
if i.endswith(".mid"):
files.append(i)
count += 1
random.shuffle(files)
# Add the information from each note in the MIDI files to an array
notes_array = np.array([read_midi(path + i) for i in files])
# converting 2D array into 1D array
notes = [element for note_ in notes_array for element in note_]
# Tokenize the list of notes
tokens = tokenize_stream(notes)
unique_notes = list(set(tokens))
print("Unique Notes: " + str(len(unique_notes)))
# Transform the data into time series format
X, Y = data_to_time_series(tokens, look_back)
n_vocab = len(set(tokens))
X_train, X_remainder, Y_train, Y_remainder = train_test_split(X, Y, train_size = train_size_v)
X_val, X_test, Y_val, Y_test = train_test_split(X_remainder,Y_remainder, test_size=0.5)
Y_train = to_categorical(Y_train, n_vocab)
Y_val = to_categorical(Y_val, n_vocab)
Y_test = to_categorical(Y_test, n_vocab)
return n_vocab, X, Y, X_train, Y_train, X_val, Y_val, X_test, Y_test
然后使用数据的方式是 LSTM 通过查看某个音符的标签来了解序列中的下一个音符应该是什么。一系列整数,如图所示 多于。
然后,这些数据被用来尝试训练一个非常简单的 LSTM 模型,但我发现我对模型的准确性没有任何运气。这是我正在使用的模型:
def build_model(model_input, model_labels, n_vocab, learning_rate):
model = Sequential()
model.add(LSTM(10, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])
model.summary()
return model
我使用单个 LSTM 层,然后使用 softmax 激活层来输出每个可能音符的概率。在本例中,只有 12 个可能的音符,因此 n_vocab 为 12。
然后我按如下方式训练模型:
def train_model(model_input, model_labels, val_input, val_labels, epochs_v, look_back, n_vocab, learning_rate):
filepath = "music_model_2/"
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
mcp_save = ModelCheckpoint(filepath, save_best_only=True, monitor='val_loss', mode='min')
reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=7, verbose=1, epsilon=1e-4, mode='min')
model = build_model(model_input, model_labels, n_vocab, learning_rate)
history = model.fit(model_input, model_labels, validation_data = (val_input, val_labels), batch_size=128, epochs=epochs_v, callbacks=[earlyStopping, mcp_save, reduce_lr_loss]).history
return history
最后,在我的主函数中,我按如下方式构建、训练和评估模型:
def main():
path = 'C_Major_Midi/'
look_back = 16 # size of lookback for the timeseries data
epochs = 40 # Number of eopchs the model runs for
training_data_split = 0.8 #The percentage split of the training and test data
number_of_midi_files = 50 # The number of midi files used to create the time series data
learning_rate = 0.001 # The learning rate of the model
batch_size = 128 # Batch size used for the model
n_vocab, X, Y, X_train, Y_train, X_val, Y_val, X_test, Y_test = get_data(path, look_back, training_data_split, number_of_midi_files)
history = train_model(X_train, Y_train, X_val, Y_val, epochs, look_back, n_vocab, learning_rate)
model = load_model("music_model_2")
test_loss, test_acc = model.evaluate(X_test, Y_test)
print('Test Loss: {}'.format(test_loss))
print('Test Accuracy: {}'.format(test_acc))
具有超参数的模型的输出如下所示:
Model: "sequential_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_13 (LSTM) (None, 10) 480
_________________________________________________________________
dense_13 (Dense) (None, 12) 132
_________________________________________________________________
activation_13 (Activation) (None, 12) 0
=================================================================
Total params: 612
Trainable params: 612
Non-trainable params: 0
运行模型并根据测试数据对其进行评估的结果如下。
这是一组经过调整的参数和模型,表明我所做的调整对提高准确性几乎没有作用。
model = Sequential()
model.add(LSTM(128, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])
look_back = 5
epochs = 100
training_data_split = 0.8
number_of_midi_files = 40
learning_rate = 0.001
batch_size = 50
这是我编写的另一个模型,它更复杂并且运行更多时期。它的准确率仍然稳定在略高于 0.3 的水平。
# First LSTM Layer
model.add(LSTM(128,input_shape=(model_input.shape[1], model_input.shape[2]),return_sequences=True))
model.add(Dropout(0.3))
# Second LSTM Layer
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))
# First Hidden Layer
model.add(Dense(256))
model.add(Dropout(0.3))
# Second Hidden Layer
model.add(Dense(256))
model.add(Dropout(0.5))
# Flatten data shape
model.add(Flatten())
# Final Output Layer
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam_v2.Adam(learning_rate=learning_rate, decay=1e-6), metrics=['accuracy'])
path = 'C_Major_Midi/'
look_back = 10
epochs = 200
training_data_split = 0.8
number_of_midi_files = 1000
learning_rate = 0.001
batch_size = 128
最终模型显示训练数据和测试数据之间具有良好的一致性,但仍稳定在 0.3。
model = Sequential()
model.add(LSTM(50, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])
look_back = 7
epochs = 50
training_data_split = 0.8
number_of_midi_files = 100
learning_rate = 0.001
batch_size = 128
如图所示,我尝试使用更大量的数据、不同的优化器、更多/更少的 LSTM 层以及其中的单位。无论我如何调整,我永远无法获得超过 0.3 的准确度分数。这是我的第一个大型机器学习项目,所以我很可能会犯一些愚蠢的错误,但我希望有经验的人让我知道为什么我的准确性这么快就趋于稳定。
非常感谢xx
I am currently trying to create an LSTM network that takes the data from many MIDI files (a digital format representing musical notes) and predicts what the next note will be in a musical sequence. I have tokenized the MIDI data into a more simple integer time series format using the following functions:
def tokenize_stream(list):
tokenized_array = []
current_token = 0
for x in list:
string_version = ' '.join(x)
if string_version in tokenizer_map:
tokenized_array.append(tokenizer_map.get(string_version))
else:
tokenizer_map.update({string_version: current_token})
tokenized_array.append(current_token)
current_token += 1
return tokenized_array
def data_to_time_series(data, window_size):
numpy_array = np.array(data)
X = []
Y = []
for i in range(len(numpy_array) - window_size):
row = [[a] for a in numpy_array[i: i + window_size]]
X.append(row)
label = numpy_array[i + window_size]
Y.append(label)
return np.array(X), np.array(Y)
These functions turn the note names into tokens like this:
Raw Data:[1,3,2,2,2,1,4]
And then turn them into time-series format like this:
Data:[1,3,2,2,2,1] -> Label:[4]
Below we can see the actual data for 2 MIDI files worth of notes:
Input Data (X):
[
[[1][3][1]...[6][6][6]]
[[3][1][0]...[6][6][1]]
[[1][0][1]...[6][1][6]]
...
[[1][2][2]...[8][1][2]]
[[2][2][1]...[1][2][8]]
[[2][1][0]...[2][8][3]]
]
Labels (Y):
[1 6 1 1 6 6 1 6 1 1 3 1 3 0 3 4 3 4 1 6 6 1 6 1 2 1 3 1 3 0 3 4 3 4 5 1 6
1 3 1 3 0 6 3 3 2 3 3 3 4 5 1 3 0 6 3 3 2 3 1 3 3 0 0 5 3 2 5 5 3 1 5 5 5
9 7 0 5 3 2 5 5 0 3 6 6 6 1 6 1 1 3 1 3 0 3 4 3 4 1 6 1 6 3 2 6 6 2 6 3 2
6 8 3 3 3 3 2 8 3 3 8 3 2 8 3 1 1 0 3 2 6 1 2 2 1 0 8 4 8 3 8 4 1 1 8 1 2
8 3 6 3 1 1 1 1 1 5 1 5 4 3 3 5 1 2 5 6 6 2 5 3 1 0 0 3 1 1 1 1 1 5 1 5 4
3 3 2 5 3 5 0 3 6 3 4 6 1 1 5 1 3 1 1 1 1 1 5 1 4 5 4 3 3 5 1 2 5 6 6 2 5
3 1 0 0 3 1 1 1 1 1 5 1 4 5 4 3 3 2 5 3 5 0 3 6 3 4 6 1 1 5 1 6 3 2 6 6 2
6 3 2 6 8 3 3 3 3 2 8 3 3 8 3 2 8 3 1 1 0 3 2 6 1 2 2 1 0 8 4 8 3 8 4 1 1
8 1 2 8 3 6]
This data is collected and put in the correct format with this function:
def get_data(path, look_back, train_size_v, number_of_midi_files):
files = []
count = 0
countMax = number_of_midi_files
for i in os.listdir(path):
if count == countMax:
break
if i.endswith(".mid"):
files.append(i)
count += 1
random.shuffle(files)
# Add the information from each note in the MIDI files to an array
notes_array = np.array([read_midi(path + i) for i in files])
# converting 2D array into 1D array
notes = [element for note_ in notes_array for element in note_]
# Tokenize the list of notes
tokens = tokenize_stream(notes)
unique_notes = list(set(tokens))
print("Unique Notes: " + str(len(unique_notes)))
# Transform the data into time series format
X, Y = data_to_time_series(tokens, look_back)
n_vocab = len(set(tokens))
X_train, X_remainder, Y_train, Y_remainder = train_test_split(X, Y, train_size = train_size_v)
X_val, X_test, Y_val, Y_test = train_test_split(X_remainder,Y_remainder, test_size=0.5)
Y_train = to_categorical(Y_train, n_vocab)
Y_val = to_categorical(Y_val, n_vocab)
Y_test = to_categorical(Y_test, n_vocab)
return n_vocab, X, Y, X_train, Y_train, X_val, Y_val, X_test, Y_test
The way the data is then used is that the LSTM learns what the next note in the sequence should be by seeing what the label is for a series of integers, as seen above.
This data is then being used to try and train a very simple LSTM model but I am not finding that I am having any luck with the accuracy of the model. Here is the model I am using:
def build_model(model_input, model_labels, n_vocab, learning_rate):
model = Sequential()
model.add(LSTM(10, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])
model.summary()
return model
I am using a single LSTM layer and then a softmax activation layer to output the probabilities of each possible note. In this instance, there are only 12 possible notes so n_vocab is 12.
I then train the model as follows:
def train_model(model_input, model_labels, val_input, val_labels, epochs_v, look_back, n_vocab, learning_rate):
filepath = "music_model_2/"
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
mcp_save = ModelCheckpoint(filepath, save_best_only=True, monitor='val_loss', mode='min')
reduce_lr_loss = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=7, verbose=1, epsilon=1e-4, mode='min')
model = build_model(model_input, model_labels, n_vocab, learning_rate)
history = model.fit(model_input, model_labels, validation_data = (val_input, val_labels), batch_size=128, epochs=epochs_v, callbacks=[earlyStopping, mcp_save, reduce_lr_loss]).history
return history
Finally, in my main function, I am building, training and evaluating the model as follows:
def main():
path = 'C_Major_Midi/'
look_back = 16 # size of lookback for the timeseries data
epochs = 40 # Number of eopchs the model runs for
training_data_split = 0.8 #The percentage split of the training and test data
number_of_midi_files = 50 # The number of midi files used to create the time series data
learning_rate = 0.001 # The learning rate of the model
batch_size = 128 # Batch size used for the model
n_vocab, X, Y, X_train, Y_train, X_val, Y_val, X_test, Y_test = get_data(path, look_back, training_data_split, number_of_midi_files)
history = train_model(X_train, Y_train, X_val, Y_val, epochs, look_back, n_vocab, learning_rate)
model = load_model("music_model_2")
test_loss, test_acc = model.evaluate(X_test, Y_test)
print('Test Loss: {}'.format(test_loss))
print('Test Accuracy: {}'.format(test_acc))
The output of the model with the hyperparameters shown here is as follows:
Model: "sequential_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_13 (LSTM) (None, 10) 480
_________________________________________________________________
dense_13 (Dense) (None, 12) 132
_________________________________________________________________
activation_13 (Activation) (None, 12) 0
=================================================================
Total params: 612
Trainable params: 612
Non-trainable params: 0
The results from running the model and evaluating it on test data are as follows.
Here is a tweaked set of parameters and model to show that the tweaks I'm making do little to improve the accuracy.
model = Sequential()
model.add(LSTM(128, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])
look_back = 5
epochs = 100
training_data_split = 0.8
number_of_midi_files = 40
learning_rate = 0.001
batch_size = 50
Here is a further model I wrote that is more complex and run for more epochs. It still plateaus at just over 0.3 accuracy.
# First LSTM Layer
model.add(LSTM(128,input_shape=(model_input.shape[1], model_input.shape[2]),return_sequences=True))
model.add(Dropout(0.3))
# Second LSTM Layer
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))
# First Hidden Layer
model.add(Dense(256))
model.add(Dropout(0.3))
# Second Hidden Layer
model.add(Dense(256))
model.add(Dropout(0.5))
# Flatten data shape
model.add(Flatten())
# Final Output Layer
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=adam_v2.Adam(learning_rate=learning_rate, decay=1e-6), metrics=['accuracy'])
path = 'C_Major_Midi/'
look_back = 10
epochs = 200
training_data_split = 0.8
number_of_midi_files = 1000
learning_rate = 0.001
batch_size = 128
And a final Model that shows good alignment between train and test data but still plateaus at 0.3.
model = Sequential()
model.add(LSTM(50, activation = 'relu', input_shape=(model_input.shape[1], model_input.shape[2])))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer= RMSprop(learning_rate=learning_rate), metrics=['accuracy'])
look_back = 7
epochs = 50
training_data_split = 0.8
number_of_midi_files = 100
learning_rate = 0.001
batch_size = 128
As shown, I have tried to use larger quantities of data, different optimizers, more/less LSTM layers and units within them. Whatever I tweak I can never achieve an accuracy score of much over 0.3. This is my first big machine learning project so it is very likely I’m making some stupid errors but I would like someone with experience to let me know why my accuracy is plateauing so quickly.
Thanks so so much xx
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论