Co-evolution of policies and environments

Reinforcement learning methods allow to build a policy that maximizes a given reward in a particular environment. The generated policy heavily depends on the domain it has been tested on. It creates two different issues: (1) the domain may be too hard for the learning process to proceed efficiently (bootstrap problem) and (2) the policy may not generate the same expected behavior in different domains (generalization issue). These two challenges are particularly important when applying reinforcement learning to robotics as the obtained policies are expected to face different situations and behave accordingly without the need to restart learning from scratch each time a modification of the environment occurs (modification of lighting conditions, of object positions or shape, etc).

The goal of this internship is to implement and test algorithms based on evolutionary approaches, and in particular on Incremental Pareto Coevolution Archive and QD-Algorithms, to deal with both challenges.

More details on the goals and on the candidate profile in the PDF description.

Lieu: 
ISIR
Thématiques: 
Encadrant: 
Stéphane Doncieux
Co-Encadrant: 
Alexandre Coninx
Référent Universitaire: 
n/a
Fichier Descriptif: 
Attribué: 
No
Année: 
2 020
Deprecated: 
No

User login