copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

SVM clustering

S. Winters-Hilt, and S. Merat. BMC Bioinformatics, (2007)
DOI: 10.1186/1471-2105-8-S7-S18

Abstract

BACKGROUND: Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways. RESULTS: An SVM-based clustering algorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labelled, this is repeated until an initial convergence occurs. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabelled data) then has its' labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labelled data) and is guaranteed to converge in this situation since it converged previously, and now it has fewer data points to carry with mislabelling penalties. This approach appears to limit exposure to the local minima traps that can occur with other approaches. Thus, the algorithm then improves on its weakly convergent result by SVM re-training after each re-labeling on the worst of the misclassified vectors - i.e., those feature vectors with confidence factor values beyond some threshold. The repetition of the above process improves the accuracy, here a measure of separability, until there are no misclassifications. Variations on this type of clustering approach are shown. CONCLUSION: Non-parametric SVM-based clustering methods may allow for much improved performance over parametric approaches, particularly if they can be designed to inherit the strengths of their supervised SVM counterparts.

Description

SVM clustering. [BMC Bioinformatics. 2007] - PubMed Result

Links and resources

BibTeX key: WintersHilt:2007:BMC-Bioinformatics:18047717
entry type: article
year: 2007
journal: BMC Bioinformatics
volume: 8 Suppl 7
pmid: 18047717
DOI: 10.1186/1471-2105-8-S7-S18
url: http://www.ncbi.nlm.nih.gov/pubmed/18047717?dopt=AbstractPlus&holding=f1000,f1000m,isrctn

@gromgull's tags highlighted

Cite this publication

%0 Journal Article %1 WintersHilt:2007:BMC-Bioinformatics:18047717 %A Winters-Hilt, S %A Merat, S %D 2007 %J BMC Bioinformatics %K clustering imported kernel-methods svm %R 10.1186/1471-2105-8-S7-S18 %T SVM clustering %U http://www.ncbi.nlm.nih.gov/pubmed/18047717?dopt=AbstractPlus&holding=f1000,f1000m,isrctn %V 8 Suppl 7 %X BACKGROUND: Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways. RESULTS: An SVM-based clustering algorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labelled, this is repeated until an initial convergence occurs. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabelled data) then has its' labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labelled data) and is guaranteed to converge in this situation since it converged previously, and now it has fewer data points to carry with mislabelling penalties. This approach appears to limit exposure to the local minima traps that can occur with other approaches. Thus, the algorithm then improves on its weakly convergent result by SVM re-training after each re-labeling on the worst of the misclassified vectors - i.e., those feature vectors with confidence factor values beyond some threshold. The repetition of the above process improves the accuracy, here a measure of separability, until there are no misclassifications. Variations on this type of clustering approach are shown. CONCLUSION: Non-parametric SVM-based clustering methods may allow for much improved performance over parametric approaches, particularly if they can be designed to inherit the strengths of their supervised SVM counterparts.

@article{WintersHilt:2007:BMC-Bioinformatics:18047717, abstract = {BACKGROUND: Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways. RESULTS: An SVM-based clustering algorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labelled, this is repeated until an initial convergence occurs. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabelled data) then has its' labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labelled data) and is guaranteed to converge in this situation since it converged previously, and now it has fewer data points to carry with mislabelling penalties. This approach appears to limit exposure to the local minima traps that can occur with other approaches. Thus, the algorithm then improves on its weakly convergent result by SVM re-training after each re-labeling on the worst of the misclassified vectors - i.e., those feature vectors with confidence factor values beyond some threshold. The repetition of the above process improves the accuracy, here a measure of separability, until there are no misclassifications. Variations on this type of clustering approach are shown. CONCLUSION: Non-parametric SVM-based clustering methods may allow for much improved performance over parametric approaches, particularly if they can be designed to inherit the strengths of their supervised SVM counterparts.}, added-at = {2009-08-19T10:32:04.000+0200}, author = {Winters-Hilt, S and Merat, S}, biburl = {https://www.bibsonomy.org/bibtex/280eb4a283e120a640eab5da772d0ca95/gromgull}, description = {SVM clustering. [BMC Bioinformatics. 2007] - PubMed Result}, doi = {10.1186/1471-2105-8-S7-S18}, interhash = {f5aafd6352dcd4795eddbf91a8ebcab2}, intrahash = {80eb4a283e120a640eab5da772d0ca95}, journal = {BMC Bioinformatics}, keywords = {clustering imported kernel-methods svm}, pmid = {18047717}, timestamp = {2009-08-24T11:05:08.000+0200}, title = {SVM clustering}, url = {http://www.ncbi.nlm.nih.gov/pubmed/18047717?dopt=AbstractPlus&holding=f1000,f1000m,isrctn}, volume = {8 Suppl 7}, year = 2007 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

SVM clustering

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML SVM clustering

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

SVM clustering

Comments and Reviews
(0)