Sven Bambach, David Crandall, Linda B. Smith, Chen Yu
Abstract
Early visual object recognition in a world full of cluttered visual information is a complicated task at which toddlers are incredibly efficient. In their everyday lives, toddlers constantly create learning experiences by actively manipulating objects and thus self-selecting object views for visual learning. The work in this paper is based on the hypothesis that active viewing and exploration of toddlers actually creates high-quality training data for object recognition. We tested this idea by collecting egocentric video data of free toy play between toddler-parent dyads, and used it to train state-of-the-art machine learning models (Convolutional Neural Networks, or CNNs). Our results show that the data collected by parents and toddlers have different visual properties and that CNNs can take advantage of these differences to learn toddler-based object models that outperform their parent counterparts in a series of controlled simulations.