Vectorized Online POMDP Planning (VOPP) | Robust Decision-making & Learning Lab

Marcus Hoerger, Muhammad Sudrajat, Hanna Kurniawati

ICRA 2026

Vectorized Online POMDP Planner (VOPP) is a fully vectorized online POMDP solver that performs planning entirely through batched tensor operations. The Partially Observable Markov Decision Process (POMDP) is a powerful framework for sequential decision-making under uncertainty, capturing the stochastic effects of actions and the noisy, partial information available to autonomous robots. POMDP solving could benefit enormously from massive parallelization on modern multicore hardware, but parallelizing online POMDP solvers has remained difficult: most approaches interleave numerical optimization to find actions with the highest expected total reward and estimation of expected total rewards themselves. This creates dependencies and synchronization bottlenecks during planning that can quickly diminish the gains of parallelization.

VOPP takes a fundamentally different approach. It builds on a recent POMDP formulation — Partially Observable Reference Policy Programming (PORPP) — which introduces analytical value functions. This eliminates the need for numerical optimization during planning, leaving only the estimation of expectations to be performed numerically, which can be parallelized efficiently. VOPP exploits this by implementing all online planning operations as fully vectorized computations over a tensor-based planning data structure. The result is a massively parallel online POMDP solver that fully harnesses the immense data-parallel throughput of modern multicore hardware, such as GPUs. In practice, VOPP computes policies using tens of thousands of parallel simulations with no explicit synchronization between simulations required.

Overview of the Vectorized Online POMDP Planner (VOPP).

Experimental results on three POMDP benchmark problems indicate that VOPP is at least 20× more efficient in computing near-optimal policies compared to a state-of-the-art parallel online POMDP solver. For some benchmarks, VOPP is more than 100× faster. Furthermore, VOPP outperforms state-of-the-art sequential online solvers while using a planning budget that is 1000× smaller.

Key Idea: Planning as Batched Tensor Computations

Traditional online POMDP solvers (left) represent belief trees as linked node structures that are traversed and expanded sequentially. VOPP (right) represents belief trees as a collection of tensors, enabling planning to be formulated as batched tensor operations over this representation.

Similar to most online POMDP solvers, VOPP performs guided belief space sampling followed by backup operations to construct a belief tree and evaluate different action sequences. However, unlike existing solvers, VOPP represents a belief tree as a collection of tensors $\{B, A, \Psi\}$ with belief node tensor $B$, action node tensor $A$ and action-preference tensor $\Psi$. Using this tensor representation, VOPP iterates the following steps during planning:

Vectorized forward search: Starting from the currently belief, VOPP performs tens of thousands of batched forward simulations in parallel to expand the belief tree
Vectorized preference backup: Starting from all leaf nodes, VOPP traverses the belief tree back to the root, updating the action preferences and belief values of all nodes at each depth using batched tensor operations.

Together, these two vectorized operations transform online POMDP planning from sequential tree traversal into a sequence of batched tensor computations, eliminating the synchronization bottlenecks of existing parallel solvers and fully exploiting the massive parallelism of modern multicore hardware, including GPUs.

VOPP uses fully vectorized forward simulations during search to expand the belief tree.

Benchmark Problems

We evaluated VOPP on three challenging planning-under-uncertainty problems:

Navigation in a partially known map — a robot navigates a 13×13 grid with randomly placed obstacles visible only through noisy local sensing. State space size: 169×2¹²⁴.
Multi-Agent Rocksample (MARS) — two cooperative rovers on a n×n map sample good rocks while avoiding bad ones. The quality of rocks must be inferred through noisy sensor readings. MARS(50,50) has an action space of 3025 actions and a state space too large for HyP-DESPOT and POMCP to handle.
CrowdNav — a Stretch 3 mobile robot navigates a 50×40m conference hall populated by 300 people whose behavior (curious or shy) is initially unknown and must be inferred online.

Navigation in a partially known map

Multi-Agent Rocksample (MARS)

CrowdNav

Results

VOPP was compared against HyP-DESPOT (the state-of-the-art parallel online solver), DESPOT, and POMCP on MARS and Navigation, across planning budgets of 0.01s to 1.0s per step.

On Navigation, VOPP achieves a success rate of 94% with an average of 19.8 steps to reach the goal, compared to HyP-DESPOT's 77% success rate and 26.8 steps at a 1s/step budget.
On MARS(20,20), VOPP achieves an average total discounted reward of 58.8$\pm$2.1 at 1s/step, compared to 47.9$\pm$1.6 for HyP-DESPOT.
On MARS(50,50) (3025 actions), VOPP achieves 45.1$\pm$2.0 at 1s/step. HyP-DESPOT, DESPOT, and POMCP could not run this variant (implementation crashed for sizes larger than MARS(36,36)).
At a planning budget of 0.05s/step, VOPP achieves a better result than HyP-DESPOT running at 1s/step — demonstrating an efficiency gain of at least 20×. (At a 0.01s/step budget, VOPP achieves ~64% of what HyP-DESPOT achieves with 1s/step ).
Against sequential solvers (DESPOT, POMCP) running at 10s/step, VOPP with 0.01s/step still achieves better results — a 1000× smaller planning budget.

References

Vectorized Online POMDP Planning

M. Hoerger, M. Sudrajat, and H. Kurniawati

In Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), 2026

Bib PDF

@inproceedings{Hoe26:VOPP,
  author = {Hoerger, M. and Sudrajat, M. and Kurniawati, H.},
  title = {Vectorized Online {POMDP} Planning},
  booktitle = {Proc. IEEE Int. Conf. on Robotics and Automation (ICRA)},
  year = {2026},
}