Skip to content

An LSTM Network for Generating Music Similar to Keyboard Works of J.S Bach

License

Notifications You must be signed in to change notification settings

AlphanAksoyoglu/AI-jsbach-music-generator

Repository files navigation

An LSTM Network for Generating Music Similar to Keyboard Works of J.S Bach

This is a 2-week project I have undertaken as a final project in my bootcamp. Although the pipeline works as intented and it generates music, it needs quite a bit of rework, restructuring along with a better dataset to generate music as intended See: Future Updates

To read a detailed description please see section Detailed Presentation

To have a test run at its current stage please see How to Run

Tech Stack

  • Python 3.8 (see requirements.tx for libraries)
  • TensorFlow
  • Keras

Description

Project mostly inspired by DeepBach by Sony and BachBot by Feynman Liang. Both of these projects used Bach chorales in 4 voices to train their network, which in turn would generate 4 voice chorales by itself.

We can use the same logic to train a network that learns Bach keyboard music (piano, harpsichord, and organ).

A great overview of several techniques used in music generation is described in this article Deep Learning Techniques for Music Generation -- A Survey by Jean-Pierre Briot, Gaëtan Hadjeres, François-David Pachet. They also address the tecnique used by DeepBach and BachBot

Data, Transformation, and Network

Data

Keyboard to Chorale

We use music data stored in MusicXML files. The files are obtained from Kunst der Fuge and Tobi's Notenarchiv.

The music stored here is keyboard or organ music, and normally has 2-3 partitions with polyphonic sequences. These polyphonic sequences in 2-3 partitions should be converted to 4 monophonic voices if we are to follow the recipe set out by DeepBach and BachBot

We perform data augmentation by transposing all the music available to us to different keys In its current stage we ended up with 33 suitable scores and their transpositions to 12 keys

We chose music that is separable to a maximum of 5 voices with a time signature of 4/4

Transformation

Image From: Deep Learning Techniques for Music Generation -- A Survey Encoding

We choose to encode the data as done by DeepBach

We use the MusicXML library to read the music data into Music21 stream object, split this music into 4 voices (with a resolution up to 1/32nd notes) and encode the data. One music data has 7 components. 4 monophonic voices, 1 musical key, and 2 for start and stop sequences

The encoded data is converted to numerical data, then categorized, and finally one hot encoded to feed into the network

This portion of the work is handled by MusicHandler() and NeuralNetIOHandler() classes in data_utils.py

Network

The Neural Network consists of an input layer, 3 LSTM layers of size 256 (512 in the diagram), and a Dense layer corresponding to the output layer.

Neural Network

How to run

The data folder contains 3 example .xml files, only this data will be processed. Processed data will be stored as .pickle files in the data folder and the network will run on this data

Install requirements: pip install -r requirements.txt Note: please install gpu version of tensorflow

Process the data: python generate_nn_data.py

Run the network for training: python run_onehot_model.py

Future Updates

Encoding and Data

  • Properly 4 voice encoded music acquired from www.kunstderfuge.com will be used for data preparation, manual handling and splitting music into voices causes a lot of issues, a major issue being an overinflated feature space.

  • Interpretation and implementations of rests will be revised. An overabundance of rests causes the network to learn to place rests everywhere. Note encoding could be done in the style of BachBot. This would cause a 2x increase in note feature space but reduce the emphasis on rests

  • A resolution down to 1/32nd notes also causes an abundance of rests

Neural Network Implementation

  • One hot encoding scheme could be replaced by a numerical encoding scheme

  • The issue of the output space is a complicated matter. I am not sure if the separated output space works as intended. The separated outputs could instead be reduced to a single multi-onehot-encoded vector. Or voice outputs could be a single multi-onehot-encoded and metadata parameters another one

  • Optimal parameters should be scanned. Particularly number of LSTM layers and LSTM layer sizes

Code Cleanup

  • Although the forward flow of the data is sufficiently put into class structures, reverse encoding the output from the model is another matter. Mostly functions in the other_utils.py should be put into proper classes

  • The music generating script that uses the trained model resides in jupyter notebook, it should be put into a proper script

  • Function names should better describe and align with the type of data they get as an input and data they output

  • Complete Type annotations would be very useful

Detailed Presentation

Slide-1 Slide-2 Slide-3 Slide-4 Slide-5 Slide-6 Slide-7 Slide-8 Slide-9 Slide-10

About

An LSTM Network for Generating Music Similar to Keyboard Works of J.S Bach

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages