A self-tuning PID control strategy using reinforcement learning is given to deal with conventional tracking control problems. Actor-Critic learning is used to tune PID parameters in an adaptive way to take advantage of the reinforcement learning properties. This policy is model-free and RBF neural network is used to approximate the parameters of PID controller. The critic part is designed to evaluate the actor part's efficiency and compensate its disabilities producing TD error that is calculated by the temporal difference of the value function between successive states in the state transition. The inputs of RBF network are system error, as well as the first and the second order differences of error. Both PSO and gradient descent are used to train the network's parameters and the controller has had a good performance being applied on the four plants.