Automatically compose music
Almost immediately after I learned programming, I wanted to create software capable of composing music.
For several years I made primitive attempts to automatically compose music for
Visions of Chaos . Basically, simple mathematical formulas or genetic mutations of random sequences of notes were used. Having recently achieved modest success in the study and application of TensorFlow and neural networks to
search for cellular automata , I decided to try using neural networks to create music.
How it works
The composer teaches a neural network with
long short-term memory (LSTM). LSTM networks are well suited for predicting what comes next in data sequences. Read more about LSTM
here .
An LSTM network receives various sequences of notes (in this case, these are single-channel midi files). After enough training, she gets the opportunity to create music similar to educational materials.
LSTM internals may seem intimidating, but using
TensorFlow and / or
Keras greatly simplifies LSTM creation and experimentation.
Source music for model training
For such simple LSTM networks, it’s enough for us that the source compositions are a single midi channel. Great for this are midi files from solo to piano. I found midi files with piano solos on the
Classical Piano Midi Page and
mfiles , and used them to train my models.
I put the music of different composers in separate folders. Thanks to this, the user can select Bach, click on the Compose button and get a composition that (hopefully) will be like Bach.
LSTM Model
The model on the basis of which I wrote the code selected
this example of the author
Sigurður Skúli Sigurgeirsson , about whom he writes in more detail
here .
I ran the lstm.py script and after 15 hours it completed the training. When I ran predict.py to generate the midi files, I was disappointed because they consisted of one repeating note. Repeating the training twice, I got the same results.
Source model
model = Sequential() model.add(CuDNNLSTM(512,input_shape=(network_input.shape[1], network_input.shape[2]),return_sequences=True)) model.add(Dropout(0.3)) model.add(CuDNNLSTM(512, return_sequences=True)) model.add(Dropout(0.3)) model.add(CuDNNLSTM(512)) model.add(Dense(256)) model.add(Dropout(0.3)) model.add(Dense(n_vocab)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=["accuracy"])
Having added graph output to the script, I saw why my model did not work. Accuracy did not grow over time, as it should. See below in the post for good graphs that show how the working model should look.
I had no idea why it happened. but abandoned this model and began to adjust the settings.
model = Sequential() model.add(CuDNNLSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True)) model.add(Dropout(0.2)) model.add(BatchNormalization()) model.add(CuDNNLSTM(256)) model.add(Dropout(0.2)) model.add(BatchNormalization()) model.add(Dense(128, activation="relu")) model.add(Dropout(0.2)) model.add(BatchNormalization()) model.add(Dense(n_vocab)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=["accuracy"])
It is more compact and has fewer LSTM layers. I also added BatchNormalization, seeing it in the
sentdex video . Most likely, there are better models, but this one worked quite well in all my training sessions.
Notice that in both models I replaced LSTM with CuDNNLSTM. So I achieved much faster LSTM training thanks to the use of Cuda. If you do not have a
GPU with Cuda support , then you have to use LSTM. Thanks to
sendtex for this tip. Learning new models and composing midi files using CuDNNLSTM is about five times faster.
How long should the model be trained?
The similarity of the results with the original music depends on the duration of the model's training (the number of eras). If there are too few eras, then the resulting result will have too many repeating notes. If there are too many eras, the model will be retrained and simply copy the original music.
But how do you know how many eras to stop?
A simple solution is to add a callback that stores a model and accuracy / loss graph every 50 eras on a training run in 500 eras. Thanks to this, after completing the training, you will get models and graphs with an increment of 50 eras, showing how the training goes.
Here are the results of the graphs of one run with saving every 50 eras, combined into one animated GIF.
These are the graphs we want to see. Losses should fall and remain low. Accuracy should increase and remain close to 100%.
It is necessary to use a model with the number of epochs corresponding to the moment when the graphs first reached their limits. For the graph shown above, it will be 150 eras. If you use older models, they will be retrained and most likely will lead to a simple copying of the source material.
The model corresponding to these graphs was trained on midi files of the Anthems category taken
from here .
Output midi data in a model with 150 eras.
Output midi data in a 100-epoch model.
Even a model with 100 eras can copy the source too accurately. This may be due to a relatively small sample of midi files for training. With more notes, learning is better.
When learning goes bad
The image above shows an example of what can sometimes happen and happens during training. Losses are reduced, and accuracy is increased, as usual, but suddenly they begin to go crazy. At this stage, it may also be worth stopping. The model will no longer (at least in my experience) learn correctly again. In this case, the saved model with 100 eras is still too random, and with 150 epochs the moment of model failure has already passed. Now I am saved every 25 eras to find the perfect moment of the model with the best training, even before she retrains and crashes.
Another example of learning error. This model was trained on midi files taken
from here . In this case, she held well for a little longer than 200 eras. When using a model with 200 eras, the following result is obtained in Midi.
Without creating graphs, we would never know if the model has problems and when they arose, and also could not get a good model without starting from scratch.
Other examples
A model with 75 eras, created on the basis of
Chopin's compositions.
A 50-era model based on
Midi files for Christmas compositions .
A 100-epoch model based on
Midi files for Christmas compositions . But are they really “Christmas”?
300-epoch model based on Bach Midi files taken
from here and
from here .
A 200-epoch model based on Balakirev's only Midi file taken
here .
A 200-era model based on
Debussy compositions.
A 175-era model based on Mozart’s compositions.
A model with 100 eras based on
Schubert compositions.
A 200-era model based on
Schumann compositions.
A 200-era model based on
Tchaikovsky’s compositions.
A model with 175 eras based on folk songs.
Model with 100 eras based on lullabies.
A 100-era model based on wedding music.
A 200-epoch model based on my own midi files taken from my
YouTube video soundtracks. It may be a bit retrained because it basically generates copies of my short one- and two-stroke midi files.
Scores
Once you
get your midi files, you can use online tools like
SolMiRe to convert them to scores. Below is the score of the midi Softology file with 200 eras presented above.
Where can I test the composer
LSTM Composer is now included in
Visions of Chaos .
Select a style from the drop-down list and click Compose. If you have installed the minimum required Python and TensorFlow (see instructions
here ), then in a few seconds (if you have a fast GPU) you will receive a new machine-written midi file that you can listen to and use for any other purpose. No copyright, no royalties. If you don’t like the results, you can click Compose again and after a few seconds a new composition will be ready.
The results can not yet be considered full-fledged compositions, but they have interesting small sequences of notes that I will use to create music in the future. In this regard, an LSTM composer can be a good source of inspiration for new compositions.
Python source
Below is the Python script code I used for LSTM training and forecasting. For these scripts to work, it is not necessary to install Visions of Chaos, and learning and generating midi will work from the command line.
Here is the training script
lstm_music_train.py
And here is the midi
lstm_music_predict.py
generation script:
Model file sizes
The disadvantage of including neural networks in Visions of Chaos is the size of the files. If the generation of the model was faster, then I would just add a button so that the end user can train the models himself. But since some of the training sessions for many models can take several days, this is not particularly practical. It seemed to me that it is better to do all the training and testing yourself, and add only the best working models. It also means that the end user just needs to press a button, and trained models will create musical compositions.
Each of the models has a size of 22 megabytes. In the conditions of the modern Internet, this is not so much, but over the years of development, Visions of Chaos has been growing in size gradually, and only recently has it suddenly increased from 70 to 91 MB (due to the cellular automaton search model). Therefore, I have so far added only one model to the main Visions of Chaos installer. For users who want more, I posted a link to another 1 GB of models. They can also use the script above to create their own models based on their midi files.
What's next?
At this stage, the LSTM composer is the simplest example of using neural networks to compose music.
I have already found other music composers on neural networks that I will experiment with in the future, so you can expect that in Visions of Chaos there will be new possibilities for automatically composing music.