Adaptive disretization using voronoi tree for continuous pomdps

Many decision making under uncertainty problems can naturally be specified as POMDP problems with continuous state, action and observation spaces. While many effective online POMDP solvers for discrete action and observation spaces exist, solving fully continuous POMDPs remains challenging, particularly for problems with high-dimensional action spaces. To alleviate this difficulty, we propose Adaptive Discretization using Voronoi Trees (ADVT), a new Monte Carlo Tree Search (MCTS) based online POMDP solver. ADVT adaptively partitions the action space for each sampled belief using a hierarchical data structure, called Voronoi Tree -- a Binary Space Partitioning that implicitly maintains the partition of a cell as as the Voronoi Diagram of two points sampled from the cell. This enables ADVT to exploit local structure of the action value function, thereby leading to a more effective exploration of the action space, particularly for higher-dimensional action spaces, compared to existing state-of-the-art methods. Additionally, ADVT handles continuous observation spaces by adopting a progressive widening strategy in combination with a weighted particle representation of beliefs. Empirical results indicate that ADVT scales substantially better to high-dimensional action spaces compare to existing methods. Our C++ implementation of ADVT is available at https://github.com/hoergems/ADVT.

Voronoi Trees

Key to ADVT is the combination of Monte Carlo Tree Search with a new hierarchical partitioning of the action space, called Voronoi Tree, illustrated in Figure 1. A Voronoi Tree is a Binary Space Partitioning (BSP) of the action space, where each partitioning hyperplane is implicitly maintained based on the Voronoi diagram of a pair of sampled representative actions. At each sampled belief, ADVT maintains a Voronoi Tree that is refined adaptively during planning: Cells are refined if they either represent large, unexplored regions of the action space, or contain actions with large estimated values. This enables ADVT to quickly focus its search on the most promising regions of the action space.

Figure 1: Illustration of the relation between a belief tree (left), the Voronoi tree associated to belief (middle) and the partition of the action space induced by the Voronoi tree (right).

Empirical Results

We evaluated ADVT on a set of benchmark POMDP problems with continuous state, action and observation spaces, including a manipulation problem with a 12-dimensional continuous action space. Our results indicate that ADVT scales to higher-dimensional action spaces much more effectively compared to two state-of-the-art online POMDP solvers.

Figure 2: Average total discounted reward achieved by ADVT (ours) and the state-of-the-art solvers VOMCPOW and POMCPOW on six benchmark POMDP problems. The average is taken over 1,000 simulation runs per problem and solver.
Figure 2: Average total discounted reward achieved by ADVT (ours) and the state-of-the-art solvers VOMCPOW and POMCPOW on the scalable manipulation problem SensorPlacement. The x-axis indicates the number of dimensions of the action space, up to a 12 dimensional continuous action space.

Additional details and results are available in our paper.

References

2024

  1. Adaptive Discretization using Voronoi Trees for Continuous POMDPs
    Marcus Hoerger, Hanna Kurniawati, Dirk Kroese, and Nan Ye
    The International Journal of Robotics Research, 2024

2023

  1. Adaptive Discretization Using Voronoi Trees for Continuous-Action POMDPs
    Marcus Hoerger, Hanna Kurniawati, Dirk Kroese, and Nan Ye
    In Algorithmic Foundations of Robotics XV, 2023

People

  • Marcus Hoerger
  • Hanna Kurniawati
  • Nan Ye
  • Dirk Kroese