Generalized mean estimation in monte-carlo tree search

  • Tuan Dam
  • , Pascal Klink
  • , Carlo D'Eramo
  • , Jan Peters
  • , Joni Pajarinen

Tutkimustuotos: KonferenssiartikkeliTieteellinenvertaisarvioitu

4 Sitaatiot (Scopus)

Abstrakti

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.

AlkuperäiskieliEnglanti
OtsikkoProceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020
ToimittajatChristian Bessiere
KustantajaInternational Joint Conferences on Artificial Intelligence
Sivut2397-2404
Sivumäärä8
ISBN (elektroninen)9780999241165
DOI - pysyväislinkit
TilaJulkaistu - 2020
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Joint Conference on Artificial Intelligence - Yokohama, Japani
Kesto: 1 tammik. 2021 → …

Julkaisusarja

NimiIJCAI International Joint Conference on Artificial Intelligence
Vuosikerta2021-January
ISSN (painettu)1045-0823

Conference

ConferenceInternational Joint Conference on Artificial Intelligence
Maa/AlueJapani
KaupunkiYokohama
Ajanjakso1/01/21 → …

Julkaisufoorumi-taso

  • Jufo-taso 2

!!ASJC Scopus subject areas

  • Artificial Intelligence

Sormenjälki

Sukella tutkimusaiheisiin 'Generalized mean estimation in monte-carlo tree search'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä