Search and rescue optimization algorithm (SAR) proposed in 2020 is a meta-heuristic optimization algorithm. It simulates the search and rescue behavior, which is used to solve constrained engineering optimization problems. However, SAR has slow convergence and its individuals can not adaptively select operations. A modifed version of SAR based on reinforcement learning, namely RLSAR, is proposed. It redesigns the local search and global search of SAR, and adds a path adjustment operation. Asynchronous advanced actor critic algorithm (A3C) is used to train the reinforcement learning model so that the SAR individuals acquire the ability to adaptively select operators. All agents are trained in a dynamic environment in which the number, location and size of threat areas are randomly generated, and then exploratory experiments are conducted on the trained model from three aspects: the contribution of each action, the path length planned under different threat areas, and the execution sequence of each individual. The results show that RLSAR has higher convergence speed than standard SAR, differential evolution algorithm and squirrel search algorithm. Furthermore, it can successfully plan a more economical, safe and effective feasible path for an unmanned aerial vehicle (UAV) in a randomly generated three-dimensional dynamic environment. These suggest that the proposed algorithm can serve as an effective path planning method for UAVs.