Music Generation Neural Network

Question

So i'm working a machine learning project thats goal is to produce music. Now for the data, I am using librosa to load wav files. Basically the data produced by librosa is the sample rate of the song along with a 1 dimensional array where each number in the array is the amplitude of the wave at that given point.

So I decided to use an auto-encoder for this problem and the accuracy is absolutely terrible. I am feeding it a 1 dimensional array with 300 samples and am attempting (as of now) to recreate the same 300 samples just to get the network configured right. But I cannot figure out how to do this as I have never got an accuracy above 20%.

Heres my current model

encoder_input = keras.Input(shape=(300,))
x = keras.layers.Reshape((1,300))(encoder_input)
x = keras.layers.LSTM(256, input_shape=(300,), activation='tanh')(x)
encoder_output = keras.layers.Dense(128, activation='relu')(x)

encoder = keras.Model(encoder_input, encoder_output)

decoder_input = keras.layers.Reshape((1,128))(encoder_output)
x = keras.layers.LSTM(128, input_shape=(128,), return_sequences=True, activation='tanh'). 
(decoder_input)
x = keras.layers.LSTM(256, activation='tanh')(x)
decoder_output = keras.layers.Dense(300, activation='relu')(x)

autoencoder = keras.Model(encoder_input, decoder_output)

autoencoder.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

Even with normalized data it still preforms poorly. Im wondering what the issue is here and maybe if im even using an entirely wrong neural network structure for this problem. Any ideas on how I can improve this?

Music Generation Neural Network

0 Answers0