G. Krempl. Tutorial at the 15th Int. Conf. on Knowledge Technologies and Data-Driven Business (i-KNOW 2015), Graz, Austria, (2015)
With data being continuously generated by sources such as web users or sensor networks, the volumes of streaming data are ever increasing. In this era of big data, not only high volumes of data arriving at high velocity need to be processed, but also the volatile nature of such data needs to be taken into account by ensuring continuous adaptation. This poses a further challenge, as human supervision capacities remain limited. The topics covered by this tutorial are these challenges, as well as techniques for addressing them by using the limited supervision and annotation capacities efficiently. The first part of this tutorial starts with an introduction to data stream mining, setting the scene by discussing inherent challenges such as concept drift. Then, exemplary algorithmic techniques for processing such data streams are briefly presented, with focus on stream classification. In the second part of this tutorial, further challenges posed by limited available feedback are discussed. The problems of missing or partially available, delayed or costly labels are studied. An overview on the currently available techniques from active, semi-supervised or transfer learning to address these challenges in evolving data streams is given, and gaps for further research are identified.