In this paper, an optimal tracking control problem is solved for high-order heterogeneous multi-agent systems with time-varying interaction networks within the framework of reinforcement learning. First, the optimal tracking control problem is formulated as a leader-follower multi-agent system. Second, a policy iteration based adaptive dynamic programming (ADP) algorithm is proposed to compute the performance index and the control law. Furthermore, the convergence to the optimal solutions is analyzed for the proposed algorithm. Third, an actor-critic neural network is applied to approximate the iterative performance index function and the control law, which implement the policy iteration algorithm online without using the knowledge of the system dynamics. Finally, some simulation results are presented to demonstrate the proposed optimal tracking control strategy.