Trajectory Planning for Safe Dual Control with Active Exploration

Under Review, 2026

Kaleb Ben Naveed¹, Manveer Singh¹, Devansh R. Agrawal¹, Dimitra Panagou^1,2

¹Department of Robotics, University of Michigan, Ann Arbor

²Department of Aerospace Engineering, University of Michigan, Ann Arbor

Dual-gatekeeper treats exploration as a certified decision: explore only when a candidate trajectory is safe, budget-feasible, and predicted to reduce uncertainty.

Abstract

Planning safe trajectories under model uncertainty is challenging because robust planning must guard against worst-case parameter realizations, often producing conservative behavior. Dual control offers a way to improve performance by actively reducing uncertainty during the mission, but exploration should happen only when it is beneficial, safe, and not too costly. This paper proposes Dual-gatekeeper, a budget-constrained safe dual control framework that evaluates informative candidate trajectories before execution. A candidate is committed only if it preserves safety, respects a mission-level exploration budget, and is predicted to shrink the parameter uncertainty set.

Safe by construction

Committed trajectories must satisfy state and input constraints under admissible uncertainty and bounded disturbances.

Budget constrained

Exploration is allowed only when its predicted excess mission cost remains inside a prescribed budget.

Actively informative

Feasible candidates are scored by predicted reduction in the current parameter uncertainty set.

Framework Overview

At every replanning step, the robust trajectory planner supplies a conservative backup trajectory. Dual-gatekeeper then generates both robust mission candidates and informative candidates, rejects unsafe or budget-infeasible options, predicts uncertainty reduction for the remaining candidates, and commits the highest-scoring feasible trajectory to the low-level controller.

Block diagram of the Dual-gatekeeper framework — Dual-gatekeeper filters candidate trajectories using safety, budget feasibility, and predicted information gain before committing a trajectory.

Five-step Dual-gatekeeper pipeline — Pipeline: build a backup trajectory, generate candidate pairs, reject unsafe candidates, select the most valuable feasible candidate, and update the backup after return.

Uncertainty Reduction

The framework uses set-membership identification to maintain a feasible parameter set that contains the true model parameter. Informative candidates are evaluated by how much they are expected to shrink this set, measured through directional widths of the uncertainty region.

Parameter bounds shrinking over time — Parameter bounds shrink as informative data is collected, while the true parameter remains inside the feasible set.

Car Racing Simulation Videos

The racing simulations compare conservative execution with Dual-gatekeeper variants. Together, they show how the framework uses safety filtering and active uncertainty reduction to decide which trajectory should be committed.

Backup Policy Only

The car follows the conservative safety-preserving policy without committing to exploratory behavior.

Dual-gatekeeper Without Uncertainty Reduction

When the uncertainty set stays large, many candidate trajectories cannot be certified as safe, so the gatekeeper does not commit them and falls back to conservative behavior.

Dual-gatekeeper With Uncertainty Reduction

As uncertainty is reduced, more candidate trajectories pass the safety and budget checks and can be committed; candidates that still fail certification are rejected.

Quadrotor Case Study

The same architecture is demonstrated on quadrotor navigation. The planner compares backup trajectories, valid candidate trajectories, and final committed solutions while respecting obstacle constraints.

Quadrotor backup trajectory, valid candidates, and final solution trajectory — Quadrotor navigation results showing backup trajectories, valid candidate trajectories, and final committed solutions.

Takeaway

Dual-gatekeeper does not rely on a loosely tuned exploration reward. It treats exploration as a verifiable decision: certify safety, check the exploration budget, predict uncertainty reduction, and then commit only the best feasible candidate.