Learned world models have been getting bigger and more capable, but planning over them at long horizons is brittle. Deep rollouts produce ill-conditioned computation graphs, the loss landscape is lumpy with local minima, and state-input gradients through vision encoders are adversarially fragile. The BAIR blog post on GRASP, from Michael Psenka, Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar, targets exactly this failure mode. The paper is at arxiv 2602.00475 with a project site at michaelpsenka.io/grasp.
Three ideas do the work. First, virtual-state lifting reframes the rollout as a collocation problem: instead of serially predicting each timestep, the optimizer simultaneously searches over states and actions with the learned dynamics appearing as soft constraints. Each timestep's prediction depends only on local variables, so the whole T-step trajectory can be optimized in parallel. Second, state stochasticity injection adds Gaussian noise to state iterates during optimization while actions continue to get deterministic gradient updates. That pushes the optimizer between basins without introducing the variance of randomized action selection. Third, gradient reshaping stops gradients flowing through the state input of the dynamics function (the brittle path through the vision model) while preserving gradients on actions, and layers on a dense goal-shaping term that matches model outputs to goal at every timestep rather than only the final state.
The numbers are the point. On Push-T manipulation, GRASP at horizon 60 hits 26.2% success with 49.1 seconds of planning time. CEM at the same horizon hits 7.2% with 83.1 seconds. Push the horizon to 80 and GRASP is 10.4% at 58.9 seconds; CEM is 2.8% at 132 seconds. Baseline methods collapse — standard gradient descent and latent collocation (LatCo) fall below 20% at H=60. GRASP doesn't just run faster at long horizons, it stays in the regime where planning actually produces useful trajectories. Benchmarks in the post cover BallNav (navigation) and Push-T (manipulation); the post does not show results on standard RL suites like DeepMind Control or Atari, so the generalization story is limited to manipulation and navigation for now.
If you are planning over learned world models and hitting the long-horizon wall, the three GRASP ideas are separable. You can adopt virtual-state lifting for parallel optimization, state-only stochasticity to escape local minima, and gradient reshaping to stabilize through vision encoders. The author list is worth noting. LeCun and Rabbat at Meta AI collaborating with Berkeley's Krishnapriyan group suggests this sits within Meta's broader world-model planning push around JEPA and V-JEPA, which matters for how quickly these techniques will show up in downstream models.
