TY - JOUR
T1 - Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm
AU - Peng, Zhinan
AU - Zhao, Yiyi
AU - Hu, Jiangping
AU - Ghosh, Bijoy Kumar
N1 - Publisher Copyright:
© 2018 Elsevier Inc.
PY - 2019/5
Y1 - 2019/5
N2 - Herein, a novel adaptive dynamic programming (ADP) algorithm is developed to solve the optimal tracking control problem of discrete-time multi-agent systems. Compared to the classical policy iteration ADP algorithm with two components, policy evaluation, and policy improvement, a two-stage policy iteration algorithm is proposed to obtain the iterative control laws and the iterative performance index functions. The proposed algorithm contains a sub-iteration procedure to calculate the iterative performance index functions at the policy evaluation. The convergence proof for the iterative performance index functions and the iterative control laws are provided. Subsequently, the stability of the closed-loop error system is also provided. Further, an actor-critic neural network (NN) is used to approximate both the iterative control laws and the iterative performance index functions. The actor-critic NN can implement the developed algorithm online without knowledge of the system dynamics. Finally, simulation results are provided to illustrate the performance of our method.
AB - Herein, a novel adaptive dynamic programming (ADP) algorithm is developed to solve the optimal tracking control problem of discrete-time multi-agent systems. Compared to the classical policy iteration ADP algorithm with two components, policy evaluation, and policy improvement, a two-stage policy iteration algorithm is proposed to obtain the iterative control laws and the iterative performance index functions. The proposed algorithm contains a sub-iteration procedure to calculate the iterative performance index functions at the policy evaluation. The convergence proof for the iterative performance index functions and the iterative control laws are provided. Subsequently, the stability of the closed-loop error system is also provided. Further, an actor-critic neural network (NN) is used to approximate both the iterative control laws and the iterative performance index functions. The actor-critic NN can implement the developed algorithm online without knowledge of the system dynamics. Finally, simulation results are provided to illustrate the performance of our method.
KW - Actor-critic networks
KW - Data-driven algorithm
KW - Multi-agent systems
KW - Optimal tracking control
KW - Two-stage policy iteration
UR - http://www.scopus.com/inward/record.url?scp=85059349032&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2018.12.079
DO - 10.1016/j.ins.2018.12.079
M3 - Article
AN - SCOPUS:85059349032
SN - 0020-0255
VL - 481
SP - 189
EP - 202
JO - Information Sciences
JF - Information Sciences
ER -