Abstract

The automatic detection, tracking, and identification of multiple people in intelligent environments are important building blocks on which smart interaction systems can be designed. Those could be, e.g., gesture recognizers, head pose estimators or far-field speech recognizers and dialog systems. In this paper, we present a system which is capable of tracking multiple people in a smart room environment while inferring their identities in a completely automatic and unobtrusive way. It relies on a set of fixed and active cameras to track the users and get close-ups of their faces for identification, and on several microphone arrays to determine active speakers and steer the attention of the system. Information coming asynchronously from several sources, such as position updates from audio or visual trackers and identification events from identification modules, is fused at higher level to gradually refine the room's situation model. The system has been trained on a small set of users and showed good performance at acquiring and keeping their identities in a smart room environment.

Links and resources

Tags

community