Distributed multi-agent temporal-difference learning with full neighbor information

Zhinan Peng, Jiangping Hu, Rui Luo, Bijoy K. Ghosh

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This paper presents a novel distributed multi-agent temporal-difference learning framework for value function approximation, which allows agents using all the neighbor information instead of the information from only one neighbor. With full neighbor information, the proposed framework (1) has a faster convergence rate, and (2) is more robust compared to the state-of-the-art approaches. Then we propose a distributed multi-agent discounted temporal difference algorithm and a distributed multi-agent average cost temporal difference learning algorithm based on the framework. Moreover, the two proposed algorithms’ theoretical convergence proofs are provided. Numerical simulation results show that our proposed algorithms are superior to the gossip-based algorithm in convergence speed, robustness to noise and time-varying network topology.

Original languageEnglish
Pages (from-to)379-389
Number of pages11
JournalControl Theory and Technology
Volume18
Issue number4
DOIs
StatePublished - Dec 2020

Keywords

  • Distributed algorithm
  • Multi-agent systems
  • Reinforcement learning
  • Temporal-difference learning

Fingerprint

Dive into the research topics of 'Distributed multi-agent temporal-difference learning with full neighbor information'. Together they form a unique fingerprint.

Cite this