Expectation maximization for average reward decentralized POMDPs

Joni Pajarinen, Jaakko Peltonen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    3 Citations (Scopus)

    Abstract

    Planning for multiple agents under uncertainty is often based on decentralized partially observable Markov decision processes (Dec-POMDPs), but current methods must de-emphasize long-term effects of actions by a discount factor. In tasks like wireless networking, agents are evaluated by average performance over time, both short and long-term effects of actions are crucial, and discounting based solutions can perform poorly. We show that under a common set of conditions expectation maximization (EM) for average reward Dec-POMDPs is stuck in a local optimum. We introduce a new average reward EM method: it outperforms a state of the art discounted-reward Dec-POMDP method in experiments.

    Original languageEnglish
    Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings
    Pages129-144
    Number of pages16
    Volume8188 LNAI
    EditionPART 1
    DOIs
    Publication statusPublished - 2013
    Publication typeA4 Article in conference proceedings
    EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 -
    Duration: 1 Jan 2013 → …

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    NumberPART 1
    Volume8188 LNAI
    ISSN (Print)0302-9743

    Conference

    ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013
    Period1/01/13 → …

    Keywords

    • Dec-POMDP
    • average reward
    • expectation maximization
    • planning under uncertainty

    Publication forum classification

    • Publication forum level 1

    Fingerprint

    Dive into the research topics of 'Expectation maximization for average reward decentralized POMDPs'. Together they form a unique fingerprint.

    Cite this