About     Blog     Publications     Research     Resume     Teaching

Research

Research overview The overarching objective of my research work is to establish provable safety guarantees for autonomous systems. During my PhD I focused on establishing safety guarantees for systems losing control over some of their actuators, which led to my resilience framework discussed below. As a postdoctoral scholar I am interested in ensuring safety for black-box systems relying on reinforcement learning and diffusion.

Guaranteeing Safety in Reinforcement Learning

My most recent research topic is to develop provable safety guarantees in Reinforcement Learning (RL). More specifically, I have been looking at enforcing hard constraints with a learned policy in closed-loop with a black-box environment. In this blog post I discuss how most of the safe RL literature cannot guarantee such constraint respect. Based on this observation, I devised POLICEd RL to solve this issue.

The main idea for POLICEd RL is to make the policy repulsive in the state-space region surrounding the constraint. This repulsive buffer will then push trajectories away from the constraint and guarantee its satisfaction. The policy learns to be repulsive as it receives penalties for each constraint violation during training. My key insight is to use the POLICE algorithm to make the policy affine in this buffer, which then allows to verify its repulsive character by only evaluating the policy at the vertices of the buffer. This discrete and small number of evaluations to guarantee safety contrasts with the typical approach of learning a safety certificate that needs to be evaluated everywhere on the state space to check if this learned certificate verifies the analytical safety conditions.

We illustrate POLICEd RL below on a simple 2D system and show that safety is guaranteed at the end of training. Without the affine policy, TD3 requires several orders of magnitude more training episodes to learn a policy pointing away from the constraint line on its whole length. For more details the POLICEd RL paper is accessible here.

animated
Training of a policy to direct trajectories toward the target (cyan) whithout crossing the constraint line (red).
The POLICEd policy is affine in the buffer region (green) and learns to push trajectories away from the constraint. The black-box dynamics are 2D and continuous, the RL algorithm is TD3.

Resilience of Autonomous Systems

After docking to the International Space Station (ISS), the Nauka module suffered a software error causing its thrusters to misfire. In turn, these uncontrolled thrusters rotated the whole space station by 540° before being counteracted by other thrusters of the ISS. Motivated by such a scenario, my PhD thesis investigated the guaranteed resilience of autonomous systems to a similar class of malfunctions called partial loss of control authority over actuators. These malfunctions are characterized by actuators producing uncontrolled and undesirable outputs instead of following the controller’s commands. A loss of control authority can be caused, for instance, by a software bug as in the ISS example or by an adversarial takeover of some actuators of the system.

In this setting, I investigated the malfunctioning system's remaining capabilities to complete its mission in terms of resilient reachability and resilient trajectory tracking. I quantified the resilience of linear systems by comparing the reachability performance of the nominal dynamics with that of the worst-case malfunctioning dynamics. The resilience of driftless systems is quantified by the Maximax Minimax Quotient Theorem whose geometrical proof is illustrated on the video below.

I extended my resilience investigation to systems further inflicted with actuation delays preventing an immediate cancellation of the undesirable outputs. I illustrated my theory on a wide range of applications including an octocopter, a fighter jet model, and an orbital inspection mission illustrated by the following video.

For more details on resilience theory see this blog post.


Astrodynamics work


Other Research Work

Transient Safety of Microgrids
  • To ensure transient safety in inverter-based microgrids, we develop a set invariance-based distributed safety verification algorithm for each inverter module. Applying Nagumo’s invariance condition, we construct a robust polynomial optimization problem to jointly search for safety-admissible set of control set-points and design parameters, under allowable disturbances from neighbors. We use sum-of-squares (SOS) programming to solve the verification problem and we perform numerical simulations using grid-forming inverters to illustrate the algorithm.
  • This work has first been presented at the 2022 American Control Conference.