Model Predictive Path Integral Control

Model Predictive Path Integral Control

Introduction to MPC

Model Predictive Control (MPC) is a control method with the following key characteristics: - Computes control action by solving an optimization problem at each time step - At each time step, we solve an optimization problem over a finite prediction horizon to obtain a sequence of control inputs - Uses a model of the system to predict future states

MPPI: A Stochastic Approach

MPPI (Model Predictive Path Integral) is a variant of MPC that uses stochastic optimization to compute control actions, instead of solving a deterministic optimization problem.

Key Components Affecting Controller Performance

1. Trajectory

  • A sequence of states that a system passes through over time
  • In MPPI, we sample multiple trajectories, evaluate the cost associated with each trajectory, and then use it to determine the optimal control
  • This sampling approach allows MPPI to handle non-linear dynamics and complex cost functions

2. The Horizon

  • The number of time steps over which future trajectories are considered in the optimization problem
  • The optimization problem tries to find the sequence of control actions over this horizon that minimizes the expected cost
  • Trade-offs:
    • If too short: not enough foresight to make good decisions
    • If too long: computationally complex and expensive

MPPI Algorithm Overview

MPPI samples multiple control trajectories and computes the expected cost for each trajectory. The control action is then selected based on the weighted average of these trajectories, with weights determined by the relative costs.

Mathematical Formulation

Objective Functions

For a system with state x and control u, the objective is to minimize the expected cost over a finite horizon T:

J(u)=E[t=0T1c(xt,ut)]

Where: - c(xt,ut) is the instantaneous cost at time step t - T is the prediction horizon - E[] denotes expectation

Key Equations

State Transition Equation:
xt+1=f(xt,ut)+wt

Where f(xt,ut) is the system dynamics model and wt is the process noise.

Control Trajectory:
uk=(u0k,u1k,,uT1k)

Representing the k-th sampled control sequence over the horizon.

Cost of Sampled Trajectory:
Sk=t=0T1c(xtk,utk)

The total cost associated with the k-th trajectory.

Quadratic Cost Function (commonly used):
c(xt,ut)=xtTQxt+utTRut

Where Q and R are weight matrices for state and control costs respectively.

Computing the Optimal Control

The optimal control input is computed as a weighted average of the sampled control trajectories:

u=k=1Kwkuk

With weights determined by the relative costs:

wk=exp(λSk)j=1Kexp(λSj)

Where λ is a temperature parameter that affects the selectivity of the weighting.