Stage : Autonomous object recognition

Autonomous object recognition in videos using Deep learning and Developmental learning

Key-words: Autonomous systems, Deep learning, Developmental learning, Unsupervised learning, Siamese Neural Networks, Similarity learning, Object discovery, Saliency, Spatio-temporal coherence

Period: 5 months, starting from February/March (subject to negotiation)

Suited skills for the candidate:

– Master in artificial intelligence or computer vision. Previous experience in neural networks would be appreciated, particularly for image recognition. Interested in developmental learning. Scientific curiosity. Ability to read/write scientific articles. Good autonomy.

– Good programming skills required (C++, python, opencv, tensorflow, git)

Frédéric Armetta and Mathieu Lefort (SMA Team) – Stefan Duffner (Imagine Team)

Localisation: LIRIS Laboratory, Lyon, France

Send applications (curriculums and motivation letters) and any request to,,

Financial reward:  500 € per month

This internship aims to contribute to the development of an autonomous object recognition system for videos. In this context, the system is exposed to a visual flow (videos) and has to extract (proto-)objects to iteratively refine its internal representation for them. The purpose of this work is to develop such a system that is able to autonomously recognize and differentiate objects thanks to the building of an internal representation for these objects. Taking inspiration from human perception and following the constructivist learning paradigm [1], we want to get rid of the use of a large labeled database, prior knowledge or sophisticated object detectors, but instead provide an autonomous development. The problem and associated objectives differ from the general way to address the learning using supervised reinforcement methods like deep learning. Indeed, no large dataset would ideally be available for the system (extraction of visual identified objects should be part of the result and not provided).

This internship will capitalize on previous promising preliminary results. The current system can first extract global shapes to catch candidate objects from the video, using simple temporal filters (for instance, a Kalman filter) and the spatio-temporal coherence of objects (movement and spatial overlap can help to define instances of objects as similar). It then uses Siamese Neural Networks [2][3] to learn a similarity metric by providing pairs of examples marked as coming from the same or from different classes. This model constructs a manifold that can be used to classify examples of unknown classes. Following these guidelines, preliminary results show that the system is able to classify new instances of objects with a good accuracy. Nevertheless, the way to maintain and make evolve this representation raises many questions that can be deepened on a short or long term according to the analysis of needs in the course of the project (catastrophic forgetting [4], active learning [5], overfitting, ability to generalize, little data, etc.).

A challenging topic that we would like to deepen during this internship relies in the possibility to use the so built internal representation to facilitate the object extraction from the videos. Indeed, without any knowledge of the objects and due to the relatively simple temporal filtering to detect candidate objects, the first extraction is coarse and highly sensitive to environmental noise. The internal representation could then be used to validate and outline the candidate objects. In this case, the object catching and the internal representation for objects evolve together. The process we want to elaborate is a self-starting one operating without external input. In other words, the so form system has to learn how to perceive efficiently in order to be able to learn more, and reciprocally. We face here a chicken-and-egg cognitive problem, also known as a representation bootstrapping problem [6].

The project could lead to a PhD position in case of financial acceptance of the associated submitted project.

Bibliography :

[1] Piaget. J. (1948), « La naissance de l’intelligence chez l’enfant »

[2] Zheng, L., Duffner, S., Idrissi, K., Garcia, C., Baskurt, A. (2016). « Pairwise Identity Verification via Linear Concentrative Metric Learning ». IEEE Transactions on Cybernetics

[3] Berlemont, S., Lefebvre, G., Duffner, S., Garcia, C. (2017). « Class-Balanced Siamese Neural Networks », Neurocomputing

[4] Goodfellow, J., Mirza, M., Xiao, D., Courville, A., Bengio, Y. (2015). « An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks », CoRR

[5] Lefort, M., Gepperth, A. (2015). « Active learning of local predictable representations with artificial curiosity ». International Conference on Development and Learning and Epigenetic Robotics (ICDL-Epirob), Providence (USA)

[6] Mazac, S., Armetta, F., Hassas, S. (2014). «On bootstrapping sensori-motor patterns for a constructivist learning system in continuous environments. In Alife 14


Laisser un commentaire