Abstract
One of the key points in music recommendation is authoring engaging playlists
according to sentiment and emotions. While previous works were mostly based on
audio for music discovery and playlists generation, we take advantage of our
synchronized lyrics dataset to combine text representations and music features
in a novel way; we therefore introduce the Synchronized Lyrics Emotion Dataset.
Unlike other approaches that randomly exploited the audio samples and the whole
text, our data is split according to the temporal information provided by the
synchronization between lyrics and audio. This work shows a comparison between
text-based and audio-based deep learning classification models using different
techniques from Natural Language Processing and Music Information Retrieval
domains. From the experiments on audio we conclude that using vocals only,
instead of the whole audio data improves the overall performances of the audio
classifier. In the lyrics experiments we exploit the state-of-the-art word
representations applied to the main Deep Learning architectures available in
literature. In our benchmarks the results show how the Bilinear LSTM classifier
with Attention based on fastText word embedding performs better than the CNN
applied on audio.
Users
Please
log in to take part in the discussion (add own reviews or comments).