Abstract
We propose constrained Earth Mover’s Distance (CEMD) Imitation Q-learning
that combines the exploration power of Reinforcement Learning (RL) and the
sample efficiency of Imitation Learning (IL). Sample efficiency makes Imitation
Q-learning a suitable approach for robot learning. For Q-learning, immediate
rewards can be efficiently computed by a greedy variant of Earth Mover’s Distance
(EMD) between the observed state-action pairs and state-actions in stored expert
demonstrations. In CEMD, we constrain the otherwise non-stationary greedy EMD
reward by proposing a greedy EMD upper bound estimate and a generic Q-learning
lower bound. In PyBullet continuous control benchmarks, CEMD is more sample
efficient, achieves higher performance and yields less variance than its competitors.
that combines the exploration power of Reinforcement Learning (RL) and the
sample efficiency of Imitation Learning (IL). Sample efficiency makes Imitation
Q-learning a suitable approach for robot learning. For Q-learning, immediate
rewards can be efficiently computed by a greedy variant of Earth Mover’s Distance
(EMD) between the observed state-action pairs and state-actions in stored expert
demonstrations. In CEMD, we constrain the otherwise non-stationary greedy EMD
reward by proposing a greedy EMD upper bound estimate and a generic Q-learning
lower bound. In PyBullet continuous control benchmarks, CEMD is more sample
efficient, achieves higher performance and yields less variance than its competitors.
| Original language | English |
|---|---|
| Publication status | Published - 2022 |
| Publication type | Not Eligible |
| Event | 36th Conference on Neural Information Processing Systems - Deep Reinforcement Learning Workshop - New Orleans Ernest N. Morial Convention Center, New Orleans, United States Duration: 28 Nov 2022 → 9 Dec 2022 https://nips.cc/Conferences/2022/ScheduleMultitrack?event=49989 |
Workshop
| Workshop | 36th Conference on Neural Information Processing Systems - Deep Reinforcement Learning Workshop |
|---|---|
| Abbreviated title | NeurIPS 2022 Deep RL workshop |
| Country/Territory | United States |
| City | New Orleans |
| Period | 28/11/22 → 9/12/22 |
| Internet address |
Keywords
- Imitation learning
- reinforcement learning
- Wasserstein distance
- Q-learning