Research

Research overview The overarching objective of my research work is to establish provable safety guarantees for autonomous systems. During my PhD I focused on establishing safety guarantees for systems losing control over some of their actuators, which led to my resilience framework discussed below. As a postdoctoral scholar I am interested in ensuring the safety of black-box systems controlled by reinforcement learning or diffusion models.

DDAT: Diffusion Planning of Dynamically Feasible Robot Trajectories Website

Diffusion planners can generate an entire trajectory in 1-shot, but their predicted sequences of states have no feasibility guarantees. A transition $s_t$ to $s_{t+1}$ is dynamically feasible if and only if there exists an action $a_t$ transforming state $s_t$ into $s_{t+1}$ by the robot's dynamics. To generate only dynamically feasible trajectories I devised several projection algorithms counteracting compounding projection errors and I incorporate these projectors into the training and inference of a diffusion transformer (DiT).

Unitree GO2 zero-shot hardware deployment of open-loop trajectories generated by a vanilla diffusion policy (left) and our DDAT model (right).
The vanilla diffusion policy fails at walking through the cones in open-loop.
By accounting for the quadruped's dynamics our open-loop diffusion policy succeeds in following the corridor.

POLICEd RL: Guaranteeing Safety in Reinforcement Learning Website

In this work I developed provable safety guarantees in Reinforcement Learning (RL). More specifically, I enforce hard constraints on a learned policy in closed-loop with a black-box environment. In this blog post I discuss how most of the safe RL literature cannot guarantee such constraint respect. Based on this observation, I devised POLICEd RL to solve this issue.

The POLICEd policy (left) is affine in the buffer region (cyan) and avoids the constraint (red) but not the baseline (right).

The main idea for POLICEd RL is to make the policy repulsive in the state-space region surrounding the constraint. This repulsive buffer will then push trajectories away from the constraint and guarantee its satisfaction. The policy learns to be repulsive as it receives penalties for each constraint violation during training. My key insight is to use the POLICE algorithm to make the policy affine in this buffer, which then allows to verify its repulsive character by only evaluating the policy at the vertices of the buffer. This discrete and small number of evaluations to guarantee safety contrasts with the typical approach of learning a safety certificate that needs to be evaluated everywhere on the state space to check if this learned certificate verifies the analytical safety conditions.

We illustrate POLICEd RL below on a simple 2D system and show that safety is guaranteed at the end of training. Without the affine policy, TD3 requires several orders of magnitude more training episodes to learn a policy pointing away from the constraint line on its whole length. For more details, checkout the paper and the website.

Training of a policy to direct trajectories toward the target (cyan) whithout crossing the constraint line (red).
The POLICEd policy (left) is affine in the buffer region (green) and learns to push trajectories away from the constraint but not the baseline (right).

Resilience of Autonomous Systems

After docking to the International Space Station (ISS), the Nauka module suffered a software error causing its thrusters to misfire. In turn, these uncontrolled thrusters rotated the whole space station by 540° before being counteracted by other thrusters of the ISS. Motivated by such a scenario, my PhD thesis investigated the guaranteed resilience of autonomous systems to a similar class of malfunctions called partial loss of control authority over actuators. These malfunctions are characterized by actuators producing uncontrolled and undesirable outputs instead of following the controller’s commands. A loss of control authority can be caused, for instance, by a software bug as in the ISS example or by an adversarial takeover of some actuators of the system.

In this setting, I investigated the malfunctioning system's remaining capabilities to complete its mission in terms of resilient reachability and resilient trajectory tracking. I quantified the resilience of linear systems by comparing the reachability performance of the nominal dynamics with that of the worst-case malfunctioning dynamics. The resilience of driftless systems is quantified by the Maximax Minimax Quotient Theorem whose geometrical proof is illustrated on the video below.

I extended my resilience investigation to systems further inflicted with actuation delays preventing an immediate cancellation of the undesirable outputs. I illustrated my theory on a wide range of applications including an octocopter, a fighter jet model, and an orbital inspection mission illustrated by the following video.

For more details on resilience theory see this blog post.

Astrodynamics work

Other Research Work

Transient Safety of Microgrids

To ensure transient safety in inverter-based microgrids, we develop a set invariance-based distributed safety verification algorithm for each inverter module. Applying Nagumo’s invariance condition, we construct a robust polynomial optimization problem to jointly search for safety-admissible set of control set-points and design parameters, under allowable disturbances from neighbors. We use sum-of-squares (SOS) programming to solve the verification problem and we perform numerical simulations using grid-forming inverters to illustrate the algorithm.
This work has first been presented at the 2022 American Control Conference.