copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation

J. Prabhala, V. K, and R. Ravi. Applied Mathematics and Sciences: An International Journal (MathSJ), 10 (1/2): 01-10 (June 2023)
DOI: 10.5121/mathsj.2023.10201

Abstract

Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding and a formalized approach for threshold searching with a given abstract similarity metric to cluster temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory, matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The findings of this research have significant implications for speech processing, speaker identification including those with tonal differences. The proposed method offers a practical and efficient solution for speaker diarization in real-world scenarios where there are labeling time and cost constraints.

Links and resources

BibTeX key: noauthororeditor
entry type: article
year: 2023
month: June
journal: Applied Mathematics and Sciences: An International Journal (MathSJ)
number: 1/2
pages: 01-10
volume: 10
language: English
issn: 2349 - 6223
DOI: 10.5121/mathsj.2023.10201
Document: https://www.airccse.com/mathsj/papers/10223mathsj01.pdf

Cite this publication

%0 Journal Article %1 noauthororeditor %A Prabhala, Jagat Chaitanya %A K, Venkatnareshbabu %A Ravi, Ragoju %D 2023 %J Applied Mathematics and Sciences: An International Journal (MathSJ) %K Speaker diarization graph matrix optimization processing signal similarity speech theory x-vector %N 1/2 %P 01-10 %R 10.5121/mathsj.2023.10201 %T Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation %U https://www.airccse.com/mathsj/papers/10223mathsj01.pdf %V 10 %X Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding and a formalized approach for threshold searching with a given abstract similarity metric to cluster temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory, matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The findings of this research have significant implications for speech processing, speaker identification including those with tonal differences. The proposed method offers a practical and efficient solution for speaker diarization in real-world scenarios where there are labeling time and cost constraints.

@article{noauthororeditor, abstract = {Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding and a formalized approach for threshold searching with a given abstract similarity metric to cluster temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory, matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The findings of this research have significant implications for speech processing, speaker identification including those with tonal differences. The proposed method offers a practical and efficient solution for speaker diarization in real-world scenarios where there are labeling time and cost constraints. }, added-at = {2023-06-28T08:52:43.000+0200}, author = {Prabhala, Jagat Chaitanya and K, Venkatnareshbabu and Ravi, Ragoju}, biburl = {https://www.bibsonomy.org/bibtex/2fdfa0df23b3227a4a002cf703bca9d23/journalmathsj}, doi = {10.5121/mathsj.2023.10201}, interhash = {9c6116eb34181b1a825ec412f4c90d2e}, intrahash = {fdfa0df23b3227a4a002cf703bca9d23}, issn = {2349 - 6223}, journal = {Applied Mathematics and Sciences: An International Journal (MathSJ)}, keywords = {Speaker diarization graph matrix optimization processing signal similarity speech theory x-vector}, language = {English}, month = {June}, number = {1/2}, pages = {01-10}, timestamp = {2023-06-28T08:52:43.000+0200}, title = {Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation}, url = {https://www.airccse.com/mathsj/papers/10223mathsj01.pdf}, volume = 10, year = 2023 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation

Comments and Reviews
(0)