Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
HongshuoFan committed Feb 2, 2024
1 parent 8313d02 commit 2c7611b
Showing 1 changed file with 99 additions and 51 deletions.
150 changes: 99 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,93 +1,141 @@
# [PRiSM_MusicGestureRecognition](https://github.com/rncm-prism/PRiSM-MusicGestureRecognition.git)

# PRiSM Music Gesture Recognition

[![PRiSM](https://img.shields.io/badge/PRiSM-RNCM-blue.svg)](https://www.rncm.ac.uk/research/research-centres-rncm/prism/)

PRiSM Music Gesture Recognition is a software tool for creating musical gesture datasets and real-time recognition of musical gestures based on audio input. It utilizes machine learning techniques to classify and interpret musical gestures, enabling applications in interactive music performance, composition, and more.

![Screenshot of the main interface](media/MainInterface.png)

**This software is in beta version and may contain bugs. Use with caution and at your own risk.**
**⚠️ This software is in beta. It may contain bugs. Use with caution and at your own risk.**

-----------

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Audio Setting](#audio-setting)
- [Create and Record Gesture Samples](#create-and-record-gesture-samples)
- [Data Augmentation](#data-augmentation)
- [Data Preprocess](#data-preprocess)
- [Training and Prediction](#training-and-prediction)
- [Fine-tuning](#fine-tuning)
- [Validation](#validation)
- [OSC Setting](#osc-setting)
- [MIDI Setting](#midi-setting)
- [Gesture Audio Player](#gesture-audio-player)
- [Contributing](#contributing)
- [License](#license)
- [Credits](#credits)
- [References](#references)


-----------

## Features
1. Create and record multiple custom gesture samples.
2. Train the ML model based on custom gesture recordings.
3. Running pre-trained models to real-time recognition of musical gestures based on audio input.
4. Output recognition result via OSC/MIDI.
5. Save/load the trained ML model and configuration.
6. Support for multiple input channels (Coming soon)
- **Custom Gesture Samples**: Create and record custom gesture samples for personalized datasets.
- **Machine Learning Model Training**: Train a machine learning model with your own gesture recordings.
- **Real-Time Recognition**: Utilize pre-trained models for real-time gesture recognition from audio input.
- **OSC/MIDI Output**: Send recognition results through OSC or MIDI for further musical application.
- **Playback Mapping**: Map recognized gestures to audio playback for interactive experiences.
- **Persistence**: Save and load trained machine learning models and configurations for consistent performance.
- **Multi-Channel Support**: Accommodate multiple input channels for diverse audio setups.

-----------

## Installation

1. Go to the [GitHub release page](https://github.com/rncm-prism/PRiSM-MusicGestureRecognition/releases) of the PRiSM Music Gesture Recognition project.
2. Download the latest version of the software package for your operating system. (**macOSX only currently**, if more people require the Windows version we can arrange it later.)
3. Copy the PRiSM Music Gesture Recognition app to your application folder.
4. Launch the PRiSM Music Gesture Recognition application.
1. Visit the [GitHub release page](https://github.com/rncm-prism/PRiSM-MusicGestureRecognition/releases) for PRiSM Music Gesture Recognition.
2. Download the [latest software package](https://github.com/rncm-prism/PRiSM-MusicGestureRecognition/releases/download/v0.25/PRiSM_MGR_v0.25.zip) compatible with your system. (Currently available for **macOS** only.)
3. Unzip and move the application to your Applications folder.
4. Open PRiSM Music Gesture Recognition from your Applications.

*Troubleshooting*: If you encounter a security warning, please refer to [Apple's guide on opening an app from an unidentified developer](https://support.apple.com/en-gb/guide/mac-help/mh40616/13.0/mac/13.0).

Note: If you see 'Open a Mac app from an unidentified developer', go to this [page](https://support.apple.com/en-gb/guide/mac-help/mh40616/13.0/mac/13.0) for a solution.

-----------

## Usage

### Audio Setting
Click the `AudioStatu` button to open the [Audio Setting window](media/AudioSetting.png)

**Make sure to always use the same sampling rate for consistent results!**
- Press `AudioStatu` to configure your audio settings.
- Enable or disable input channels as required.
- Engage or bypass compression as needed.

*Important*: Consistent sampling rates are crucial for reliable results.


### Create and record gesture samples
1. Click the `SelectFolder` button to choose a folder to store all the recordings.
2. Click the `Create` button to enter the name of the new gesture.
3. Use the `Record` button to start and stop recording the sample.
>V0.25 new feature: Click the `Amp` button to enable the automatic recording trigger using the input amplitude detector.
5. Click the `Save` button to save the recording to the selected folder.
6. Repeat steps 3 and 4 to record additional samples for the current gesture.

Note: You can use the `dropdown menu` to select a saved sample and the `Play` button to listen to it.
1. Choose a directory for your recordings with `SelectFolder`.
2. Create a new gesture with `Create`. Ensure unique, simple names. \
**Make sure the gesture name is unique and does not contain spaces and other special characters!**
3. Record samples with `Record`. Optionally, use `Amp` for automatic triggering.
4. Save your samples with `Save`.
5. Review and play samples using the dropdown menu and `Play`.

### Data Preparation
**Note: You can use the `dropdown menu` to select a saved sample and the `Play` button to listen to it.**

Click the `Activate` button to preprocess the data.
### Data Augmentation

Apply random pitch and time stretch to existing samples to generate new files and enhance the dataset.

1. Set the number of files to generate with `NumFiles`.
2. Enable and set random pitch range with `PitchRange`.
3. Enable and set random time streth range with `StrethRange`.
4. Click the `Activate` button to preprocess the data.

### Data Preprocess

Click the `Activate` button to preprocess the data. \
**Adjust `Spectrum Components` in the setting window for more detailed feature sets**

### Training and Prediction

1. After Data Preparation is finished, click the `Train` button to start the training process.
2. The area below will display the training loss. Wait for it to reach a satisfactory level (lower is generally better).
3. Click the `Train` button again to stop the training.
4. Click the `Prediction` button to enable real-time gesture recognition.
5. You can also use the `dropdown menu` and `Play` button to test the trained model with saved samples.
6. Click the `Save` button to save ~~the trained model and configuration to disk.~~
>v0.25b new features: save OSC setting with configurations; merge the configurations and trained model into one file.
8. Click the `Load` button to load the configuration file only! ***it will auto-load the model***.
1. Initiate training with `Train` and monitor the loss levels.
2. Stop training manually or let it auto-stop at a loss below 0.05.
3. Enable real-time recognition with `Prediction`.
4. Save your model with `Save` and load with `Load`.

### Fine-tuning

Click the `Setting` button to open the setting window where you can find adjustable parameters. \
Parameter Name | Description | Default Value
------------- | ------------- | -------------
Pitch Confidence threshold | Threshold for the pitch detection algorithm helps reduce noise in the feature extraction. | 30%
On Threshold | Amplitude gate level, used to trigger listening and prediction. | -39dB
Timer | The system reports after listening. If the timer is shorter than the default, it refreshes the buffer and forces a prediction. If longer than the default, it is disabled. | Default is the longest duration in the training files but no more than 6s.
Click the `Setting` button to open the [setting window](media/Setting.png) where you can find adjustable parameters. \
**Some parameters can be controlled with OSC messages, the receive port is `1123`**

Parameter Name | Description | Default Value | Range | OSC address
------------- | ------------- | ------------- | -------------|------------- |
On Threshold | Amplitude gate level, used to trigger listening and prediction. | -39dB | -60dB - 0dB | /OnThreshold |
Accuracy Threshold | Filtering the predict result below the threshold. | 0. | 0. - 1. | /AccuracyThreshold |
Timer | The system reports after listening. If the timer is shorter than the default, it refreshes the buffer and forces a prediction. If longer than the default, it is disabled. | Default is the longest duration in the training files but no more than 10 seconds. | 50ms - 10000ms | /Timmer |
Spectum Components | The number of frequency components in the spectrogram. | f0, f1 | f0 - f7 | 🚫 |
Prediction | Disable and enable prediction | 0 | 0 / 1 | /Prediction |

### OSC setting
### Validation

Test your model with the `Player & Validation` module.
- Using the `dropdown menu` and `Play` button to test the trained model with saved samples.
- Set the number of validation and toggle to enable the automatic random validation. After the auto-validation is finished, it will display each gestuers' accuracy and `average accuracy`.

### OSC Setting

Click the `OSC` button to enable OSC output and open the [OSC setting window](media/OSC_Setting.png) to configure the OSC IP address and port.
By default, the recognition results are sent to `127.0.0.1:9001` with the message address `/PRiSM_GR`.

### MIDI setting

Click the `MIDI` button to enable MIDI output and open the [MIDI setting window](media/MidiSetting.png) to configure the MIDI output.
### MIDI Setting

Click the `MIDI` button to enable MIDI output and open the [MIDI setting window](media/MidiSetting.png) to configure the MIDI output, for instance, change the output MIDI Channel.
The recognition results are automatically mapped to MIDI notes starting from MIDI note `60`. For example, the first gesture corresponds to MIDI note `60`, the second gesture to MIDI note `61`, and so on.

### AudioPlayer setting
>v0.26b new features
### Gesture Audio player
Click the `Audio` button to enable gesture audio playback and open the [Gesture AudioPlayer window](media/AudioPlayer.png) to configure the Gestures-Audio mapping.

- After training is finished, the gesture-audio cells will spawn.
- Click the `SelectFolder` button to load the playback audio files folder.
- Using each cell to enable/disable and configure the mapping.

Click the `Audio` button to enable the gesture audio player and open the player window to configure the gesture-audio mapping.
1. After completing the training, the gesture-audio cells will spawn in the player window.
2. Put all your audio files in one folder and Click the `SelectFolder` button to load them into the system.
3. Using the dropdown menu in each gesture-audio cell to configure the mapping.
>todo: remove mapping
-----------

## Contributing
Expand All @@ -109,5 +157,5 @@ This work is supported by [PRiSM](https://www.rncm.ac.uk/research/research-centr
-----------

## References
[flucoma-max](https://github.com/flucoma/flucoma-max) \
[max-sdk](https://github.com/Cycling74/max-sdk)
- [flucoma-max](https://github.com/flucoma/flucoma-max)
- [max-sdk](https://github.com/Cycling74/max-sdk)

0 comments on commit 2c7611b

Please sign in to comment.