Recent symbolic music generative models have achieved significant improvements
in the quality of the generated samples. Nevertheless, it remains hard for users
to control the output in such a way that it matches their expectation. To address
this limitation, high-level, human-interpretable conditioning is essential. In this
work, we release FIGARO, a Transformer-based conditional model trained to
generate symbolic music based on a sequence of high-level control codes. To this
end, we propose description-to-sequence learning, which consists of automatically
extracting fine-grained, human-interpretable features (the description) and training
a sequence-to-sequence model to reconstruct the original sequence given only the
description as input. FIGARO achieves state-of-the-art performance in multi-track
symbolic music generation both in terms of style transfer and sample quality. We
show that performance can be further improved by combining human-interpretable
with learned features. Our extensive experimental evaluation shows that FIGARO is
able to generate samples that closely adhere to the content of the input descriptions,
even when they deviate significantly from the training distribution.
Description
FIGARO: Controllable Music Generation using Learned and Expert Features | OpenReview
%0 Conference Paper
%1 \\ütte2023figaro
%A Dimitri, von Rütte
%A Luca, Biggio
%A Yannic, Kilcher
%A Thomas, Hofmann
%B The Eleventh International Conference on Learning Representations
%D 2023
%K controllable generation
%T FIGARO: Controllable Music Generation using Learned and Expert Features
%U https://openreview.net/forum?id=NyR8OZFHw6i
%X Recent symbolic music generative models have achieved significant improvements
in the quality of the generated samples. Nevertheless, it remains hard for users
to control the output in such a way that it matches their expectation. To address
this limitation, high-level, human-interpretable conditioning is essential. In this
work, we release FIGARO, a Transformer-based conditional model trained to
generate symbolic music based on a sequence of high-level control codes. To this
end, we propose description-to-sequence learning, which consists of automatically
extracting fine-grained, human-interpretable features (the description) and training
a sequence-to-sequence model to reconstruct the original sequence given only the
description as input. FIGARO achieves state-of-the-art performance in multi-track
symbolic music generation both in terms of style transfer and sample quality. We
show that performance can be further improved by combining human-interpretable
with learned features. Our extensive experimental evaluation shows that FIGARO is
able to generate samples that closely adhere to the content of the input descriptions,
even when they deviate significantly from the training distribution.
@inproceedings{\nr{\\\"u}tte2023figaro,
\nauthor = {Dimitri von R{\\\"u}tte and Luca Biggio and Yannic Kilcher and Thomas Hofmann},
\nbooktitle = {The Eleventh International Conference on Learning Representations},
\ntitle = {{FIGARO}: Controllable Music Generation using Learned and Expert Features},
\nurl = {https://openreview.net/forum?id=NyR8OZFHw6i},
\nyear = {2023},
abstract = {Recent symbolic music generative models have achieved significant improvements
in the quality of the generated samples. Nevertheless, it remains hard for users
to control the output in such a way that it matches their expectation. To address
this limitation, high-level, human-interpretable conditioning is essential. In this
work, we release FIGARO, a Transformer-based conditional model trained to
generate symbolic music based on a sequence of high-level control codes. To this
end, we propose description-to-sequence learning, which consists of automatically
extracting fine-grained, human-interpretable features (the description) and training
a sequence-to-sequence model to reconstruct the original sequence given only the
description as input. FIGARO achieves state-of-the-art performance in multi-track
symbolic music generation both in terms of style transfer and sample quality. We
show that performance can be further improved by combining human-interpretable
with learned features. Our extensive experimental evaluation shows that FIGARO is
able to generate samples that closely adhere to the content of the input descriptions,
even when they deviate significantly from the training distribution.},
added-at = {2023-04-17T13:01:06.000+0200},
author = {Dimitri, von R{\"u}tte and Luca, Biggio and Yannic, Kilcher and Thomas, Hofmann},
biburl = {https://www.bibsonomy.org/bibtex/2a4cbc9a455617ba2aec767bf39cd7078/alex_h},
booktitle = {The Eleventh International Conference on Learning Representations},
description = {FIGARO: Controllable Music Generation using Learned and Expert Features | OpenReview},
interhash = {773d92d11446efe8c71b594b6b85feda},
intrahash = {a4cbc9a455617ba2aec767bf39cd7078},
keywords = {controllable generation},
month = feb,
timestamp = {2023-04-17T13:01:38.000+0200},
title = {FIGARO: Controllable Music Generation using Learned and Expert Features},
url = {https://openreview.net/forum?id=NyR8OZFHw6i},
year = 2023
}