Constrained Imitation Q-learning with Earth Mover’s Distance reward

Tutkimustuotos: PosteriScientific

Abstrakti

We propose constrained Earth Mover’s Distance (CEMD) Imitation Q-learning
that combines the exploration power of Reinforcement Learning (RL) and the
sample efficiency of Imitation Learning (IL). Sample efficiency makes Imitation
Q-learning a suitable approach for robot learning. For Q-learning, immediate
rewards can be efficiently computed by a greedy variant of Earth Mover’s Distance
(EMD) between the observed state-action pairs and state-actions in stored expert
demonstrations. In CEMD, we constrain the otherwise non-stationary greedy EMD
reward by proposing a greedy EMD upper bound estimate and a generic Q-learning
lower bound. In PyBullet continuous control benchmarks, CEMD is more sample
efficient, achieves higher performance and yields less variance than its competitors.
AlkuperäiskieliEnglanti
TilaJulkaistu - 2022
OKM-julkaisutyyppiEi OKM-tyyppiä
Tapahtuma36th Conference on Neural Information Processing Systems - Deep Reinforcement Learning Workshop
- New Orleans Ernest N. Morial Convention Center, New Orleans, Yhdysvallat
Kesto: 28 marrask. 20229 jouluk. 2022
https://nips.cc/Conferences/2022/ScheduleMultitrack?event=49989

Workshop

Workshop36th Conference on Neural Information Processing Systems - Deep Reinforcement Learning Workshop
LyhennettäNeurIPS 2022 Deep RL workshop
Maa/AlueYhdysvallat
KaupunkiNew Orleans
Ajanjakso28/11/229/12/22
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'Constrained Imitation Q-learning with Earth Mover’s Distance reward'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä