FunRL - Fundamentals of Reinforcement Learning

Pierre Gaillard,
Researcher, Inria
pierre.gaillard@inria.fr

Bruno Gaujal
Researcher at Inria,
bruno.gaujal@inria.fr


DESCRIPTION

Reinforcement learning algorithms have witnessed growing popularity and tremendous success in recent years. They are now used to solve optimization problems that were believed almost impossible a few years ago. Yet, despite this empirical success, they are very power-hungry and there are very few algorithmswith performance guarantees. The performance of a reinforcement learning algorithm can be measured by using various metrics: the sample efficiency measures how many samples (of a simulator, or of a real environment) an algorithm needs to find a “good” policy, whereas the computational efficiency measures the amount of computation (or memory) needed.Finally its regret measures the gap between the sequential rewards obtained by the learning algorithm and the rewards of the optimal policy (that is unknown).

The goal of a reinforcement learning algorithm is to gather information about the unknownsystem being explored by the learner to better understand its dynamical properties and exploit them to optimize its behavior. Whenever the learner has an a priori offline information about the system, it can leverage this knowledge to be more efficient in learning its optimal behavior. This approach is coined by the global concept of structured learning.

This leads us to the research question that we want to tackle with the FunRL project:
 
How to design algorithms with optimal theoretical guarantees that exploit a (known or unknown) structure of the problem to solve?
 
This question will be developed in three directions.

First, we will tackle the online control of queueing networks, which raises the important issue of stability and rarely visited states. The as Markov decisionprocesses (MDPs), which are stochastic dynamical systems that can be controlled. The main originality of this axe with respect to the others is that these dynamical systems are constrained by the structure of the problem, the challenge being to efficiently use our knowledge of such a structure.Third we will study parametric learning, where a learner adapts its policy to a problem with a known structure but whose parameters are unknown. This has applications to auto-scaling problems in cloud computing, resource allocation, and sequential decisions.

ACTIVITIES

The FunRL projet will be hiring high skilled and motivated researchers on the following supports: 

– 2 PhD students
– 1 Postdoctoral researcher for two year
– 2 Interns for six months each, that could be a first step before the phd projects mentioned abiove.

In addition, we plan to hire industrial (Cifre) PhD students in collaboration with Criteo and EDF R&D. They have already agreed in principle. The companies will cover the PhD students’ salaries and provide additional funding for the operation of the chair.

On the education front, one pedagogical engineer will be hired for two years to help design a MOOC on reinforcement learning and interactive exercice sessions for L3 to M2 students.
Published on  November 18, 2025
Updated on February 3, 2026