@oraj

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE – A REVIEW ARTICLE

, , , and . Operations Research and Applications: An International Journal (ORAJ), 1 (1): 15-21 (August 2014)
DOI: DOI : 10.5121/oraj.2014.1103

Abstract

In this paper we present a review of some applications of cluster analysis in the field of Insurance and allied sciences. Primarily there are two types of clustering techniques used in predictive analytics based on the business problems, Partition-based clustering technique and Hierarchical agglomerative clustering approach. Hierarchical agglomeration based clustering approach is time consuming and complexity increases with increase in number of dimensions. Partition based algorithms contrary to hierarchical tries to divide the search space before arriving at the final clusters. Both methods have its merits and demerits and hence proper knowledge of domain, number of variables and computation prowess is required before deciding on the algorithm. Insurance industry is rich in data and attributes that can be used for data analytics are varied in nature. Hence, Hierarchical methods are generally not suitable for Insurance. K-means algorithms depending on partition-based clustering techniques are popular and widely used and applied to a variety of domains specifically in Insurance. However, K-means algorithms are extremely sensitive to the choice of initial centroid. Several different initialization approaches were proposed for the K-means algorithm in the last decades due to such sensitivity. This paper proposes an iterative Multiple Random method for selection of initial cluster centroid in Kmeans clustering instead of the simple random seed methods. Performance assessment of the proposed initialization method over two different Insurance datasets with different dimensions of distance functions, numbers of observations, groups and clustering complexities are discussed in detail. The proposed algorithm is developed in-house using Java and results are compared with some of the standard available software. Results from two insurance datasets varying in business problems and attributes, clearly indicates that the proposed initialization method is more effective and converges to more accurate clustering results than those of the simple random initialization methods.

Links and resources

Tags