Inproceedings,

EBLA: A Perceptually Grounded Model of Language Acquisition

, , , and .
Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data, page 46--53. Morristown, NJ, USA, Association for Computational Linguistics, (2003)

Abstract

This paper introduces an open computational framework for visual perception and grounded language acquisition called Experience-Based Language Acquisition (EBLA). EBLA can "watch" a series of short videos and acquire a simple language of nouns and verbs corresponding to the objects and object-object relations in those videos. Upon acquiring this protolanguage, EBLA can perform basic scene analysis to generate descriptions of novel videos. The performance of EBLA has been evaluated based on accuracy and speed of protolanguage acquisition as well as on accuracy of generated scene descriptions. For a test set of simple animations, EBLA had average acquisition success rates as high as 100% and average description success rates as high as 96.7%. For a larger set of real videos, EBLA had average acquisition success rates as high as 95.8% and average description success rates as high as 65.3%. The lower description success rate for the videos is attributed to the wide variance in the appearance of objects across the test set. While there have been several systems capable of learning object or event labels for videos, EBLA is the first known system to acquire both nouns and verbs using a grounded computer vision system.

Tags

Users

  • @tmalsburg
  • @stefano

Comments and Reviews