Due to an explosion of data, there has been an increasing demand for scalable machine learning and data mining algorithms in many applications, such as social network analysis, information retrieval, recommendation system, biology applications, multimedia, and e-commerce. The objective of this special issue is to connect academia and industry on the methods and experiences of large scale data analysis. We look for scalable machine learning, data mining algorithms, implementations, frameworks and case studies that target at real and practical scenarios for large datasets. The focus is to identify the real challenges in large-scale data mining and to investigate the scalable methods and practical solutions of the core machine learning and data mining problems with respect to both theoretical and experimental perspectives.
«Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored the use of diverse sources of weak supervision to produce competitive end model classifiers. In this paper, we survey recent work on weak supervision, and in particular, we investigate the Data Programming (DP) framework. Taking a set of potentially noisy heuristics as input, DP assigns denoised probabilistic labels to each data point in a dataset using a probabilistic graphical model of heuristics. We analyze the math fundamentals behind DP and demonstrate the power of it by applying it on two real-world text classification tasks. Furthermore, we compare DP with pointillistic active and semi-supervised learning techniques traditionally applied in data-sparse settings.»
I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, page 935--940. New York, NY, USA, ACM, (2006)
H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein. (2017)cite arxiv:1712.09913Comment: NIPS 2018 (extended version, 10.5 pages), code is available at https://github.com/tomgoldstein/loss-landscape.
E. Reshetnyak, H. Cham, and J. Hughes. Multivariate behavioral research, 51 (6):
871-876(August 2016)Anàlisi de dades; Marginal structural models; Introductori.
A. Linden, and P. Yarnold. Journal of evaluation in clinical practice, 23 (4):
703-712(August 2017)Propensity score; Classification trees; Machine learning.
M. Courbariaux, Y. Bengio, and J. David. (2014)cite arxiv:1412.7024v5.pdfComment: 10 pages, 5 figures, Accepted as a workshop contribution at ICLR 2015.