Misc,

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

S. Venkatesh, D. Moffat, and E. Miranda.
(2021)cite arxiv:2109.00962Comment: 20 pages, 4 figures, 6 tables. Added more experimental validation.

Abstract

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. YOHO obtained a higher F-measure and lower error rate than the state-of-the-art Convolutional Recurrent Neural Network on multiple datasets. As YOHO is purely a convolutional neural network and has no recurrent layers, it is faster during inference. In addition, as this approach is more end-to-end and predicts acoustic boundaries directly, it is significantly quicker during post-processing and smoothing.

BibTeX key: venkatesh2021yololike
entry type: misc
year: 2021
url: http://arxiv.org/abs/2109.00962
note: cite arxiv:2109.00962Comment: 20 pages, 4 figures, 6 tables. Added more experimental validation

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{venkatesh2021yololike, abstract = {Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. YOHO obtained a higher F-measure and lower error rate than the state-of-the-art Convolutional Recurrent Neural Network on multiple datasets. As YOHO is purely a convolutional neural network and has no recurrent layers, it is faster during inference. In addition, as this approach is more end-to-end and predicts acoustic boundaries directly, it is significantly quicker during post-processing and smoothing.}, added-at = {2022-03-08T09:49:54.000+0100}, author = {Venkatesh, Satvik and Moffat, David and Miranda, Eduardo Reck}, biburl = {https://www.bibsonomy.org/bibtex/29695a761267d44a7f7be1c13bfcac84d/annakrause}, description = {[2109.00962] You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection}, interhash = {e0dd5369a04a8b31e886d111f68708e3}, intrahash = {9695a761267d44a7f7be1c13bfcac84d}, keywords = {audio segmentation singleshot}, note = {cite arxiv:2109.00962Comment: 20 pages, 4 figures, 6 tables. Added more experimental validation}, timestamp = {2022-03-08T09:49:54.000+0100}, title = {You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection}, url = {http://arxiv.org/abs/2109.00962}, year = 2021 }

BibSonomy

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on