Real-time Drum Accompaniment Generation

SCAN THIS QR CODE TO ACCESS THIS PAGE ON YOUR MOBILE DEVICE summary_qr dev_stages OR go to https://behzadhaki.github.io/summary/




Background


The main objective of my PhD has been the development of a system that can generate real-time drum accompaniments for live instrumental performances.

The idea here was that instead of focusing mainly on the generative process from an architectural perspective, I wanted to also focus on the interaction between the system and the performer.

dev_stages

  • small models
  • deployed in an easy-to-use environment
  • allowing control over the generation process

In order to be able to use small models, we focus on loop generation, which is a more constrained task than long-term accompaniment generation.

Also, instead of analyzing the harmonic content of the input, we focus on the rhythmic content. This will allow us to (1) generate drum accompaniments that are rhythmically interlocked with the input, and (2) given that we didn’t want to create a human-like system, this constraint will naturally limit the human-like behavior of the system.




1 - Drum Generation Using Transformers


In my first year, we developed an offline long-term drum generation system using Transformer-XL architecture.

  1. Transformer Neural Networks for Automated Rhythm Generation

    Transformer Neural Networks for Automated Rhythm Generation

    Thomas Nuttall, Behzad Haki, Sergi Jorda
    Proceedings of the International Conference on New Interfaces for Musical Expression · 2021
    Abstract

    Recent applications of Transformer neural networks in the field of music have demonstrated their ability to effectively capture and emulate long-term dependencies characteristic of human notions of musicality and creative merit. We propose a novel approach to automated symbolic rhythm generation, where a Transformer-XL model trained on the Magenta Groove MIDI Dataset is used for the tasks of sequence generation and continuation. Hundreds of generations are evaluated using blind-listening tests to determine the extent to which the aspects of rhythm we understand to be valuable are learnt and reproduced. Our model is able to achieve a standard of rhythmic production comparable to human playing across arbitrarily long time periods and multiple playing styles.

    BibTeX
    @inproceedings{NIME21_33,
      article-number = {33},
      author = {Nuttall, Thomas and Haki, Behzad and Jorda, Sergi},
      title = {Transformer Neural Networks for Automated Rhythm Generation},
      booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
      year = {2021},
      month = jun,
      address = {Shanghai, China},
      issn = {2220-4806},
      doi = {10.21428/92fbeb44.fe9a0d82},
      url = {https://nime.pubpub.org/pub/8947fhly},
      presentation-video = {https://youtu.be/Ul9s8qSMUgU},
    }

While the system was able to generate ‘realistic’ long-term drum patterns, it was not suitable for real-time generation due to the computational complexity of the model. Moreover, I wanted to focus on generating drums conditioned on live instrumental performances.

As such, we decided to re-think our approach and focus on loop generation using smaller models. Inspired with Google Magenta’s GrooVAE, we explored whether we can use transformers to generate drum loops. Moreover, in order to speed-up the performance in real-time deployment, we also explored whether we can avoid tokenization of the input and output sequences. That is, whether we could use a very simple representation of our sequences.


How to represent drum loops?

Grid relative piano-roll representation. Tempo-agnostic, and focus on 4/4 time signature.

hvo


How to condition on live performances?

Use a simple rhythm extracted from the input performance. Then, develop a system that can convert this rhythm into a drum loop.

dev_stages


How to extract rhythm from live performances?

Simply flatten all the notes into a single track, then represent similar to the drum loop representation.

dev_stages


Final Architecture to Train

dev_stages


Dataset

Ideally, we needed paired instrumental performances and drum loops. However, to keep it simple, we initially focused on using a drum only dataset (Groove MIDI Dataset). As such, we trained a model that would convert a drum rhythm into a multi-voice drum pattern.

In the final real-time system, we could then replace the drum rhythm with the rhythm extracted from the live performance.

Read More

Details of the methodology and evaluation can be found in the following blog post I’ve prepared:

Blog Post





2 - Creating a Real-Time Accompaniment Context Around the Loop Generator



Main Objective

In real-time, extract rhythms from a MIDI performance, convert into a drum loop.

dev_stages


Model Training Vs. Inference

During training, convert drum groove to drum pattern.

In real-time inference, replace drum groove with instrumental groove.

dev_stages


Deployment

Overdub inputs to create a longer evolving accompaniment.

A VST using a Camomile (Pure Data) front-end and a python backend was developed.

Pd/Camomile: Visual Interface, Midi Processing, Sequence Playback, and Drum Synthesis

Python Backend: Model Inference

Communication: OSC

In real-time inference, replace drum groove with instrumental groove.

dev_stages

Further details can be found in the following publication:

  1. Real-Time Drum Accompaniment Using Transformer Architecture

    Real-Time Drum Accompaniment Using Transformer Architecture

    Behzad Haki, Marina Nieto, Teresa Pelinski, and 1 more
    Proceedings of the 3rd International Conference on on AI and Musical Creativity · 2022
    Abstract

    This paper presents a real-time drum generation system capable of accompanying a human instrumentalist. The drum generation model is a transformer encoder trained to predict a short drum pattern given a reduced rhythmic representation. We demonstrate that with certain design considerations, the short drum pattern generator can be used as a real-time accompaniment in musical sessions lasting much longer than the duration of the training samples. A discussion on the potentials, limitations and possible future continuations of this work is provided.

    BibTeX
    @inproceedings{haki_behzad_2022_7088343,
      author = {Haki, Behzad and Nieto, Marina and Pelinski, Teresa and Jordà, Sergi},
      title = {{Real-Time Drum Accompaniment Using Transformer Architecture}},
      booktitle = {{Proceedings of the 3rd International Conference on on AI and Musical Creativity}},
      year = {2022},
      publisher = {AIMC},
      month = sep,
      doi = {10.5281/zenodo.7088343},
      url = {https://doi.org/10.5281/zenodo.7088343},
    }


System Demo









3 - Improved Deployment Procedure


  • Pd/Python setup is very cumbersome.
  • Required some technical knowledge to set up the system.
  • To make it more accessible, we had to develop a standalone system.
  • The challenge was that there was very little resources available to develop a standalone system.
  • A good deal of a year was spent on porting the system to a standalone system.
  • This was done using the JUCE framework as well as the TorchScript library.

Standalone Groove2Drum Accompaniment System


Source Code

The development of the standalone system was a significant challenge. Once done, I realized that I won’t be able to develop a standalone system for every upcoming project if I have to go through the same process again and again.

As such, I decided to develop a framework that would allow me to easily deploy my models in a standalone system. This framework is called NeuralMidiFx. The idea behind this framework is to allow researchers to easily deploy their models in a standalone system without having to worry about the technical details of the deployment process.

Using the framework, one can very quickly design a graphical interface using a simple JSON file. Then, all the communication between the interface and all the dedicated background threads in handled by the framework.

While the development of the framework was extremely costly in terms of time, it has simplified the process to the extent that I can now develop a standalone system in a matter of days. Moreover, the framework is open-source, and I hope that it will be useful for other researchers as well.


Main Idea Behind NeuralMidiFx: Division of Responsibilities

dev_stages


NeuralMidiFx Architecture

dev_stages


Further details can be found in the following publication:

  1. NeuralMidiFx: A Wrapper Template for Deploying Neural Networks as VST3 Plugins

    NeuralMidiFx: A Wrapper Template for Deploying Neural Networks as VST3 Plugins

    Behzad Haki, Julian Lenz, Sergi Jorda
    Proceedings of the 4th International Conference on on AI and Musical Creativity · 2023
    Abstract

    Proper research, development and evaluation of AI-based generative systems of music that focus on performance or composition require active user-system interactions. To include a diverse group of users that can properly engage with a given system, researchers should provide easy access to their developed systems. Given that many users (i.e. musicians) are non-technical to the field of AI and the development frameworks involved, the researchers should aim to make their systems accessible within the environments commonly used in production/composition workflows (e.g. in the form of plugins hosted in digital audio workstations). Unfortunately, deploying generative systems in this manner is highly expensive. As such, researchers with limited resources are often unable to provide easy access to their works, and subsequently, are not able to properly evaluate and encourage active engagement with their systems. Facing these limitations, we have been working on a solution that allows for easy, effective and accessible deployment of generative systems. To this end, we propose a wrapper/template called NeuralMidiFx, which streamlines the deployment of neural network based symbolic music generation systems as VST3 plugins. The proposed wrapper is intended to allow researchers to develop plugins with ease while requiring minimal familiarity with plugin development.

    BibTeX
    @inproceedings{Haki2023NeuralMidiFx,
      author = {Haki, Behzad and Lenz, Julian and Jorda, Sergi},
      booktitle = {{Proceedings of the 4th International Conference on on AI and Musical Creativity}},
      publisher = {},
      title = {{NeuralMidiFx: A Wrapper Template for Deploying Neural Networks as VST3 Plugins}},
      year = {2023},
      month = sep,
    }





4 - In-Situ Evaluation of the Groove2Drum VST


  • The evaluation of the Groove2Drum VST was done in-situ.
  • We asked Raul Refree to come to the lab and test the system.
  • Raul was asked to perform on a MIDI keyboard, and the system would generate drum accompaniments in real-time.
  • Subsequently, we asked him to provide feedback on the system.

Feedback

  • System was hard to use at first.
  • There are many controls that are overwhelming in a live performance setting.
  • Perhaps the system was more useful in a studio/production setting.
  • If designed for live performance, it should be as autonomous as possible.
  • That said, having some quick controls to change/guide the generation process would be useful.

Following the feedback, we decided to develop a new system that would be more autonomous and would require less user input.

To do so, we modified our architecture into a Variational Autoencoder (VAE). This would allow us to generate drum loops in a more controlled manner. That is, we could be able to guide the generation to certain predefined patterns.

This was the main focus of the GrooveTransformer system, discussed in the following section.





5 - GrooveTransformer


  • The main idea behind the GrooveTransformer system was to develop a system that would allow us to generate drum loops in a more controlled manner.
  • i.e. allow user to fall-back to predefined patterns.
  • Moreover, allow user to decide how much the generations should be guided by the input performance.


Concept If we develop a VAE model, then we could achieve the above objectives as follows:

dev_stages


Architecture

same as before, except the variational layer.

dev_stages


Deployment

The system was deployed in a standalone system using the NeuralMidiFx framework.


Demos


More Recordings Here


Refree’s Live Rehearsal at CCCB






6 - GrooveTransformer in Euro-rack Format


  • Main objective was to push the users to interact with the system in a context that the model was not developed for.
  • i.e. use unconventional rhythms and also synthesize the drums in a more unconventional manner.


Final Prototype

dev_stages


Deployment Procedure

dev_stages


Demos

Setup:

For the GrooveTransformer’s input groove, it receives a multiple of the Intellijel Metropolix gate and pitch sequence that controls the Acid Technology Chainsaw voice in the Eurorack. In this scenario, we have opted to use the pitch values of each note event to represent the velocity of the input. The pitch and gate voltages are sent from the Eurorack to the GrooveTransformer via an Expert Sleepers ES-9 DC-coupled audio interface and a Cardinal CV to MIDI converter plug-in. Generated drum patterns from the GrooveTransformer are converted to Control Voltage (CV) and sent to the Eurorack via a PolyEnd Poly2 MIDI to CV converter.

Drum Synthesis:

We use 7 voices of the generated patterns (kick, snare, open and closed hi-hat, and lo/mid/hi tom) to trigger 4 voices in the Eurorack: Kick: Schlappi Engineering Angle Grinder + Make Noise Moddemix VCA + Intellijel Quadrax envelope generator Snare: Intellijel Plonk Open and Closed Hi-Hats: Basimilus Iteritas Alter Lo, Mid, Hi Toms: Akemie’s Taiko

To retain generated dynamics, the kick, hi-hats, and toms are routed to individual channels on a Mutable Instruments Veils. The level of each channel is controlled with the velocity sequence associated with the corresponding voice The Intellijel Plonk has a dedicated velocity input that we utilized rather than routing the signal to Veils


Further details can be found in the following publication:

  1. GrooveTransformer: A Generative Drum Sequencer Eurorack Module

    GrooveTransformer: A Generative Drum Sequencer Eurorack Module

    Nicholas Evans, Behzad Haki, Sergi Jorda
    Proceedings of the International Conference on New Interfaces for Musical Expression (NIME) 2024 · 2024
    Abstract

    This paper presents the GrooveTransformer, a Eurorack module designed for generative drum sequencing. Central to its design is a Variational Auto-Encoder (VAE), around which we have designed a deployment context enabling performance through accompaniment and/or user interaction. This module allows the user to use the system as an accompaniment generator while interacting with the generative processes in real-time. In this paper, we review the design principles and technical architecture of the module, while also discussing the potentials and short-comings of our work.

    BibTeX
    @inproceedings{Haki2024GrooveTransformer,
      author = {Evans, Nicholas and Haki, Behzad and Jorda, Sergi},
      booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression (NIME) 2024},
      year = {2024},
      month = sep,
      publisher = {NIME},
      title = {{GrooveTransformer: A Generative Drum Sequencer Eurorack Module}},
    }





6 - Improved Controllability


  • Using user feedback, we decided to improve the controllability of the system.
  • Added Genre control and Voice Redistribution controls:


Architecture

dev_stages


Deployment

Download the VST Plugin


Note

Currently, under review for ISMIR 2024. However, extra information can be found here




7 - Adaptation to Audio


  • The main objective was to adapt the system to audio input.
  • We had done one experiment before, which used audio input with the same architecture employed for the symbolic2symbolic systems.
  • Read the following publication for more details:
  1. Completing Audio Drum Loops with Symbolic Drum Suggestions

    Completing Audio Drum Loops with Symbolic Drum Suggestions

    Behzad Haki, Teresa Pelinski, Marina Nieto, and 1 more
    Proceedings of the International Conference on New Interfaces for Musical Expression (NIME) 2023 · 2023
    Abstract

    Sampled drums can be used as an affordable way of creating human-like drum tracks, or perhaps more interestingly, can be used as a mean of experimentation with rhythm and groove. Similarly, AI-based drum generation tools can focus on creating human-like drum patterns, or alternatively, focus on providing producers/musicians with means of experimentation with rhythm. In this work, we aimed to explore the latter approach. To this end, we present a suite of Transformer-based models aimed at completing audio drum loops with stylistically consistent symbolic drum events. Our proposed models rely on a reduced spectral representation of the drum loop, striking a balance between a raw audio recording and an exact symbolic transcription. Using a number of objective evaluations, we explore the validity of our approach and identify several challenges that need to be further studied in future iterations of this work. Lastly, we provide a real-time VST plugin that allows musicians/producers to utilize the models in real-time production settings.

    BibTeX
    @inproceedings{Haki2023Completing,
      author = {Haki, Behzad and Pelinski, Teresa and Nieto, Marina and Jorda, Sergi},
      booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression (NIME) 2023},
      year = {2023},
      month = apr,
      publisher = {NIME},
      title = {{Completing Audio Drum Loops with Symbolic Drum Suggestions}},
    }

Previous Work on Audio Input

dev_stages


  • The initial idea was to use the above approach for adapting the current systems to audio input.
  • Given time constraints, we decided to use a simpler approach.
  • We use a publicly available onset detection system called DDC_Onset to extract onsets from the audio input.
  • We then convert these onsets into a drum loop using the GrooveTransformer system.
  • The system was deployed in a standalone system using the NeuralMidiFx framework.

Deployment Developed plugins will be released soon.


Demos




8 - Going Beyond Drum Grooves


  • Objective here was to develop the system such that it had knowledge of the type of instrument being played.
  • i.e. It was able to model the rhythmic interplay between different instruments and drums.
  • For this, we developed a number of Groove2Groove conversion systems using the same architecture as the GrooveTransformer system.
  • These systems are then paired with the GrooveTransformer system to generate drum loops that are interlocked with the input performance.
  • for the data, we’ve used the Lakh MIDI Dataset. In here, there are many multi-track midis that we can use to train the Groove2Groove systems.
  • For these systems, we’ve focused on Bass-Drum, Guitar-Drum, and Piano-Drum pairs.


  • The details of this work will be released soon.

Demos




9 - Final Experiment (On-going): Long-term Accompaniment Generation


  • The main objective here is to develop a system that can generate long-term accompaniments without overdubbing
  • That is, we want the system to consider the entirety of the performance prior to generate the accompaniment.

dev_stages