EBLA: A Perceptually Grounded Model of Language Acquisition
B. Pangburn, R. Mathews, S. Iyengar, and J. Ayo. Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data, page 46--53. Morristown, NJ, USA, Association for Computational Linguistics, (2003)
Abstract
This paper introduces an open computational
framework for visual perception and grounded
language acquisition called Experience-Based
Language Acquisition (EBLA). EBLA can
"watch" a series of short videos and acquire a
simple language of nouns and verbs corresponding to the objects and object-object relations in those videos. Upon acquiring this
protolanguage, EBLA can perform basic
scene analysis to generate descriptions of
novel videos.
The performance of EBLA has been evaluated
based on accuracy and speed of protolanguage
acquisition as well as on accuracy of generated scene descriptions. For a test set of simple animations, EBLA had average acquisition
success rates as high as 100% and average description success rates as high as 96.7%. For
a larger set of real videos, EBLA had average
acquisition success rates as high as 95.8% and
average description success rates as high as
65.3%. The lower description success rate for
the videos is attributed to the wide variance in
the appearance of objects across the test set.
While there have been several systems capable of learning object or event labels for videos, EBLA is the first known system to
acquire both nouns and verbs using a
grounded computer vision system.
%0 Conference Paper
%1 Pangburn2003
%A Pangburn, Brian E.
%A Mathews, Robert C.
%A Iyengar, S. Sitharama
%A Ayo, Jonathan P.
%B Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data
%C Morristown, NJ, USA
%D 2003
%I Association for Computational Linguistics
%K grounding language vision machinelearning multimodality
%P 46--53
%T EBLA: A Perceptually Grounded Model of Language Acquisition
%X This paper introduces an open computational
framework for visual perception and grounded
language acquisition called Experience-Based
Language Acquisition (EBLA). EBLA can
"watch" a series of short videos and acquire a
simple language of nouns and verbs corresponding to the objects and object-object relations in those videos. Upon acquiring this
protolanguage, EBLA can perform basic
scene analysis to generate descriptions of
novel videos.
The performance of EBLA has been evaluated
based on accuracy and speed of protolanguage
acquisition as well as on accuracy of generated scene descriptions. For a test set of simple animations, EBLA had average acquisition
success rates as high as 100% and average description success rates as high as 96.7%. For
a larger set of real videos, EBLA had average
acquisition success rates as high as 95.8% and
average description success rates as high as
65.3%. The lower description success rate for
the videos is attributed to the wide variance in
the appearance of objects across the test set.
While there have been several systems capable of learning object or event labels for videos, EBLA is the first known system to
acquire both nouns and verbs using a
grounded computer vision system.
@inproceedings{Pangburn2003,
abstract = {This paper introduces an open computational
framework for visual perception and grounded
language acquisition called Experience-Based
Language Acquisition (EBLA). EBLA can
"watch" a series of short videos and acquire a
simple language of nouns and verbs corresponding to the objects and object-object relations in those videos. Upon acquiring this
protolanguage, EBLA can perform basic
scene analysis to generate descriptions of
novel videos.
The performance of EBLA has been evaluated
based on accuracy and speed of protolanguage
acquisition as well as on accuracy of generated scene descriptions. For a test set of simple animations, EBLA had average acquisition
success rates as high as 100% and average description success rates as high as 96.7%. For
a larger set of real videos, EBLA had average
acquisition success rates as high as 95.8% and
average description success rates as high as
65.3%. The lower description success rate for
the videos is attributed to the wide variance in
the appearance of objects across the test set.
While there have been several systems capable of learning object or event labels for videos, EBLA is the first known system to
acquire both nouns and verbs using a
grounded computer vision system.
},
added-at = {2007-01-22T14:39:55.000+0100},
address = {Morristown, NJ, USA},
author = {Pangburn, Brian E. and Mathews, Robert C. and Iyengar, S. Sitharama and Ayo, Jonathan P.},
biburl = {https://www.bibsonomy.org/bibtex/2c1de452a8b17cc0247c29eb1f771b09d/tmalsburg},
booktitle = {Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data},
interhash = {98a53724f3068aec41661365219d8c76},
intrahash = {c1de452a8b17cc0247c29eb1f771b09d},
keywords = {grounding language vision machinelearning multimodality},
pages = {46--53},
publisher = {Association for Computational Linguistics},
timestamp = {2007-01-22T14:39:55.000+0100},
title = {EBLA: A Perceptually Grounded Model of Language Acquisition},
year = 2003
}