摘要
This article addresses the online reinforcement Q-learning algorithms to design H-infinity tracking controller for unknown discrete-time linear systems. An augmented system composed of the original system and the command generator is constructed, and a discounted performance function is introduced to establish a discounted game algebraic Riccati equation (GARE). The existence conditions of a solution to the GARE are proposed and a lower bound is found for the discount factor to assure the stability of the H-infinity tracking control solution. The Q-function Bellman equation is then derived, based on which the reinforcement Q-learning algorithm is developed to learn the solution to H-infinity tracking control problem without knowing the system dynamics. Both state-data-driven and output-data-driven reinforcement Q-learning algorithms toward finding the control policies are proposed. Unlike the value function approximation (VFA)-based approach, it is proved that the Q-learning scheme brings out no bias of solution to the Q-function Bellman equation under the probing noise satisfying the persistent excitation (PE) condition, and therefore, converges to the nominal discounted GARE solution. Moreover, the proposed output-data-driven method is more powerful than the state-datadriven method as it may not be available to completely measure the full system states in practical applications. A simulation example with a single-phase voltage-source UPS inverter is used to verify the effectiveness of the proposed Q-learning algorithms.