Research
DDAT: Diffusion Planning of Dynamically Feasible Robot Trajectories
Diffusion planners can generate an entire trajectory in 1-shot, but their predicted sequences of states have no feasibility guarantees. A transition $s_t$ to $s_{t+1}$ is dynamically feasible if and only if there exists an action $a_t$ transforming state $s_t$ into $s_{t+1}$ by the robot's dynamics. To generate only dynamically feasible trajectories I devised several projection algorithms counteracting compounding projection errors and I incorporate these projectors into the training and inference of a diffusion transformer (DiT). More details coming soon.
POLICEd RL: Guaranteeing Safety in Reinforcement Learning
In this work I developed provable safety guarantees in Reinforcement Learning (RL). More specifically, I enforce hard constraints on a learned policy in closed-loop with a black-box environment. In this blog post I discuss how most of the safe RL literature cannot guarantee such constraint respect. Based on this observation, I devised POLICEd RL to solve this issue.
The main idea for POLICEd RL is to make the policy repulsive in the state-space region surrounding the constraint. This repulsive buffer will then push trajectories away from the constraint and guarantee its satisfaction. The policy learns to be repulsive as it receives penalties for each constraint violation during training. My key insight is to use the POLICE algorithm to make the policy affine in this buffer, which then allows to verify its repulsive character by only evaluating the policy at the vertices of the buffer. This discrete and small number of evaluations to guarantee safety contrasts with the typical approach of learning a safety certificate that needs to be evaluated everywhere on the state space to check if this learned certificate verifies the analytical safety conditions.
We illustrate POLICEd RL below on a simple 2D system and show that safety is guaranteed at the end of training. Without the affine policy, TD3 requires several orders of magnitude more training episodes to learn a policy pointing away from the constraint line on its whole length. For more details, checkout the paper and the website.

The POLICEd policy is affine in the buffer region (green) and learns to push trajectories away from the constraint. The black-box dynamics are 2D and continuous, the RL algorithm is TD3.
Resilience of Autonomous Systems
After docking to the International Space Station (ISS), the Nauka module suffered a software error causing its thrusters to misfire. In turn, these uncontrolled thrusters rotated the whole space station by 540° before being counteracted by other thrusters of the ISS. Motivated by such a scenario, my PhD thesis investigated the guaranteed resilience of autonomous systems to a similar class of malfunctions called partial loss of control authority over actuators. These malfunctions are characterized by actuators producing uncontrolled and undesirable outputs instead of following the controller’s commands. A loss of control authority can be caused, for instance, by a software bug as in the ISS example or by an adversarial takeover of some actuators of the system.
In this setting, I investigated the malfunctioning system's remaining capabilities to complete its mission in terms of resilient reachability and resilient trajectory tracking. I quantified the resilience of linear systems by comparing the reachability performance of the nominal dynamics with that of the worst-case malfunctioning dynamics. The resilience of driftless systems is quantified by the Maximax Minimax Quotient Theorem whose geometrical proof is illustrated on the video below.
I extended my resilience investigation to systems further inflicted with actuation delays preventing an immediate cancellation of the undesirable outputs. I illustrated my theory on a wide range of applications including an octocopter, a fighter jet model, and an orbital inspection mission illustrated by the following video.
For more details on resilience theory see this blog post.
Astrodynamics work
Other Research Work
Transient Safety of Microgrids
- To ensure transient safety in inverter-based microgrids, we develop a set invariance-based distributed safety verification algorithm for each inverter module. Applying Nagumo’s invariance condition, we construct a robust polynomial optimization problem to jointly search for safety-admissible set of control set-points and design parameters, under allowable disturbances from neighbors. We use sum-of-squares (SOS) programming to solve the verification problem and we perform numerical simulations using grid-forming inverters to illustrate the algorithm.
- This work has first been presented at the 2022 American Control Conference.