EM has been shown to have favorable convergence properties, automatical satisfaction of constraints, and fast convergence. The next section explains the traditional approach to deriving the EM algorithm and proving its convergence property. Section 3.3 covers the interpretion the EM algorithm as the maximization of two quantities: the entropy and the expectation of complete-data likelihood. Then, the K-means algorithm and the EM algorithm are compared. The conditions under which the EM algorithm is reduced to the K-means are also explained. The discussion in Section 3.4 generalizes the EM algorithm described in Sections 3.2 and 3.3 to problems with partial-data and hidden-state. We refer to this new type of EM as the doubly stochastic EM. Finally, the chapter is concluded in Section 3.5.
D. Arthur, and S. Vassilvitskii. SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, page 1027--1035. Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, (2007)
D. Arthur, and S. Vassilvitskii. SCG '06: Proceedings of the twenty-second annual symposium on Computational geometry, page 144--153. New York, NY, USA, ACM, (2006)
D. Arthur, and S. Vassilvitskii. SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, page 1027--1035. Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, (2007)
J. MacQueen. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability - Vol. 1, page 281--297. University of California Press, Berkeley, CA, USA, (1967)
J. MacQueen. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability - Vol. 1, page 281--297. University of California Press, Berkeley, CA, USA, (1967)